Skip to content
项目
群组
代码片段
帮助
正在加载...
帮助
为 GitLab 提交贡献
登录/注册
切换导航
Z
zion
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
分枝图
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
1
合并请求
1
CI / CD
CI / CD
流水线
作业
计划
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
分枝图
统计图
创建新议题
作业
提交
议题看板
打开侧边栏
zhengfg
zion
Commits
9c9bfa0d
提交
9c9bfa0d
authored
11月 01, 2019
作者:
梁业锦
💬
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
优化爬虫抓取链接的判断
上级
e91ff555
隐藏空白字符变更
内嵌
并排
正在显示
11 个修改的文件
包含
386 行增加
和
214 行删除
+386
-214
SpiderSpecification.md
doc/SpiderSpecification.md
+8
-6
HmSpider.java
...n/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java
+20
-8
LilySpider.java
...java/com/diaoyun/zion/chinafrica/bis/impl/LilySpider.java
+0
-1
PullandbearSpider.java
...m/diaoyun/zion/chinafrica/bis/impl/PullandbearSpider.java
+15
-3
UniqloSpider.java
...va/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java
+1
-1
VansSpider.java
...java/com/diaoyun/zion/chinafrica/bis/impl/VansSpider.java
+52
-0
SpiderServiceImpl.java
...aoyun/zion/chinafrica/service/impl/SpiderServiceImpl.java
+12
-13
HMSpiderParse.java
...va/com/diaoyun/zion/master/util/spider/HMSpiderParse.java
+5
-3
PullAndBearSpiderParse.java
...aoyun/zion/master/util/spider/PullAndBearSpiderParse.java
+0
-160
SpiderUtil.java
.../java/com/diaoyun/zion/master/util/spider/SpiderUtil.java
+136
-19
VansSpiderParse.java
.../com/diaoyun/zion/master/util/spider/VansSpiderParse.java
+137
-0
没有找到文件。
doc/SpiderSpecification.md
浏览文件 @
9c9bfa0d
...
@@ -36,17 +36,19 @@
...
@@ -36,17 +36,19 @@
-
命名:gap
-
命名:gap
-
爬虫进度:
**已完成**
-
爬虫进度:
**已完成**
-
失效,无法爬取数据
-
失效,无法爬取数据
### [Zara](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/ZaraSpider.java)
### [Zara](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/ZaraSpider.java)
-
主页:https://www.zara.cn/cn
-
主页:https://www.zara.cn/cn
-
命名:zara
-
命名:zara
-
爬虫进度:
**已完成**
-
爬虫进度:
**已完成**
-
可能存在的缺陷:
-
可能存在的缺陷:
### [Uniqlo(优衣库)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java)
### [Uniqlo(优衣库)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java)
-
主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html
-
主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html
-
命名:uniqlo
-
命名:uniqlo
-
爬虫进度:
**已完成**
-
爬虫进度:
**已完成**
-
App无法爬取数据
-
失效
-
链接做了反爬处理
-
可能存在的缺陷:
-
可能存在的缺陷:
-
图片的路径是直接下载图片
-
图片的路径是直接下载图片
...
@@ -65,9 +67,8 @@
...
@@ -65,9 +67,8 @@
### [H&M](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java)
### [H&M](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java)
-
主页:https://www2.hm.com/zh_cn/
-
主页:https://www2.hm.com/zh_cn/
-
命名:hm
-
命名:hm
-
爬虫进度:已能获取到数据
-
爬虫进度:
**已完成**
-
Json被做了一些难处理的封装,现有工具无法将其转换为Json格式
-
商品颜色通过商品详情页的url来区分,暂未找到规律
### LiLy
### LiLy
-
主页:http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore
-
主页:http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore
...
@@ -138,10 +139,11 @@
...
@@ -138,10 +139,11 @@
-
主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html
-
主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html
-
命名:massimodutti
-
命名:massimodutti
-
爬虫进度:
**已完成**
-
爬虫进度:
**已完成**
-
失效
-
链接做了反爬处理
-
数据来源
-
数据来源
-
商品详情:https://www.massimodutti.cn/cn/%E5%A5%B3%E8%A3%85/%E7%B3%BB%E5%88%97/%E8%A1%AC%E8%A1%AB%E5%92%8C%E7%BD%A9%E8%A1%AB/%E8%A1%AC%E8%A1%AB/%E6%BB%91%E9%9B%AA%E9%A3%8E%E7%B3%BB%E5%88%97%E9%A5%B0%E5%8F%A3%E8%A2%8B%E8%A1%AC%E8%A1%AB-c1718602p8730105.html?colorId=420&categoryId=1718602
-
商品详情:https://www.massimodutti.cn/cn/%E5%A5%B3%E8%A3%85/%E7%B3%BB%E5%88%97/%E8%A1%AC%E8%A1%AB%E5%92%8C%E7%BD%A9%E8%A1%AB/%E8%A1%AC%E8%A1%AB/%E6%BB%91%E9%9B%AA%E9%A3%8E%E7%B3%BB%E5%88%97%E9%A5%B0%E5%8F%A3%E8%A2%8B%E8%A1%AC%E8%A1%AB-c1718602p8730105.html?colorId=420&categoryId=1718602
-
数据接口:https://www.massimodutti.cn/itxrest/2/catalog/store/35009478/30359500/category/0/product/8730105/detail?languageId=-7&appId=1
-
数据接口:https://www.massimodutti.cn/itxrest/2/catalog/store/35009478/30359500/category/0/product/8730105/detail?languageId=-7&appId=1
-
App数据爬取失效
### [COACH(蔻驰)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/CoachSpider.java)
### [COACH(蔻驰)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/CoachSpider.java)
-
主页:https://china.coach.com/women.html
-
主页:https://china.coach.com/women.html
...
...
src/main/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java
浏览文件 @
9c9bfa0d
...
@@ -4,23 +4,15 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
...
@@ -4,23 +4,15 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.vo.ProductResponse
;
import
com.diaoyun.zion.chinafrica.vo.ProductResponse
;
import
com.diaoyun.zion.master.util.HttpClientUtil
;
import
com.diaoyun.zion.master.util.HttpClientUtil
;
import
com.diaoyun.zion.master.util.JsoupUtil
;
import
com.diaoyun.zion.master.util.TranslateHelper
;
import
com.diaoyun.zion.master.util.TranslateHelper
;
import
com.diaoyun.zion.master.util.spider.HMSpiderParse
;
import
com.diaoyun.zion.master.util.spider.HMSpiderParse
;
import
net.sf.json.JSONObject
;
import
net.sf.json.JSONObject
;
import
org.jsoup.Jsoup
;
import
org.jsoup.nodes.Document
;
import
org.jsoup.nodes.Element
;
import
org.jsoup.select.Elements
;
import
org.slf4j.Logger
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
import
org.slf4j.LoggerFactory
;
import
org.springframework.stereotype.Component
;
import
org.springframework.stereotype.Component
;
import
javax.print.Doc
;
import
java.io.IOException
;
import
java.io.IOException
;
import
java.net.URISyntaxException
;
import
java.net.URISyntaxException
;
import
java.util.HashMap
;
import
java.util.Map
;
import
java.util.concurrent.ExecutionException
;
import
java.util.concurrent.ExecutionException
;
import
java.util.concurrent.TimeoutException
;
import
java.util.concurrent.TimeoutException
;
...
@@ -47,6 +39,8 @@ public class HmSpider implements IItemSpider {
...
@@ -47,6 +39,8 @@ public class HmSpider implements IItemSpider {
*/
*/
@Override
@Override
public
JSONObject
captureItem
(
String
targetUrl
)
throws
URISyntaxException
,
IOException
,
ExecutionException
,
InterruptedException
,
TimeoutException
{
public
JSONObject
captureItem
(
String
targetUrl
)
throws
URISyntaxException
,
IOException
,
ExecutionException
,
InterruptedException
,
TimeoutException
{
String
[]
spilt
=
targetUrl
.
split
(
"productpage."
);
targetUrl
=
"https://www2.hm.com/zh_cn/productpage."
+
spilt
[
1
];
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
HM
.
getValue
());
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
HM
.
getValue
());
ProductResponse
productResponse
=
HMSpiderParse
.
formatProductResponse
(
content
);
ProductResponse
productResponse
=
HMSpiderParse
.
formatProductResponse
(
content
);
JSONObject
resultObj
=
JSONObject
.
fromObject
(
productResponse
);
JSONObject
resultObj
=
JSONObject
.
fromObject
(
productResponse
);
...
@@ -54,4 +48,22 @@ public class HmSpider implements IItemSpider {
...
@@ -54,4 +48,22 @@ public class HmSpider implements IItemSpider {
return
resultObj
;
return
resultObj
;
}
}
// public static void main(String[] args) throws Exception {
// String targetUrl = "https://m2.hm.com/m/zh_cn/productpage.0806412004.html";
// String[] spilt = targetUrl.split("productpage.");
// targetUrl = "https://www2.hm.com/zh_cn/productpage." + spilt[1];
// String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.HM.getValue());
// // 获取主要数据并将转换 Json 数据及 Document 对象
// String detailStr = JsoupUtil.getScriptContent(content, "productArticleDetails");
// int firstBrackets = detailStr.indexOf("{");
// int lastbrackets = detailStr.lastIndexOf("}");
// String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
// resultStr = resultStr.replaceAll("\'", "\"")
// .replaceAll("\"image\": isDesktop [?] ", "")
// .replaceAll("\"fullscreen\": isDesktop [?] ", "")
// .replaceAll("\"zoom\": isDesktop [?] ", "");
// JSONObject dataMap = JSONObject.fromObject(resultStr);
// Document document = Jsoup.parse(content);
// }
}
}
src/main/java/com/diaoyun/zion/chinafrica/bis/impl/LilySpider.java
浏览文件 @
9c9bfa0d
...
@@ -48,7 +48,6 @@ public class LilySpider implements IItemSpider {
...
@@ -48,7 +48,6 @@ public class LilySpider implements IItemSpider {
String
targetUrl
=
"http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore/24003/276409"
;
String
targetUrl
=
"http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore/24003/276409"
;
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
LILY
.
getValue
());
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
LILY
.
getValue
());
Document
document
=
Jsoup
.
parse
(
content
);
Document
document
=
Jsoup
.
parse
(
content
);
System
.
out
.
println
(
content
);
System
.
err
.
println
(
document
);
System
.
err
.
println
(
document
);
}
}
}
}
src/main/java/com/diaoyun/zion/chinafrica/bis/impl/PullandbearSpider.java
浏览文件 @
9c9bfa0d
...
@@ -4,8 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
...
@@ -4,8 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.vo.ProductResponse
;
import
com.diaoyun.zion.chinafrica.vo.ProductResponse
;
import
com.diaoyun.zion.master.util.HttpClientUtil
;
import
com.diaoyun.zion.master.util.HttpClientUtil
;
import
com.diaoyun.zion.master.util.spider.PullAndBearSpiderParse
;
import
com.diaoyun.zion.master.util.TranslateHelper
;
import
com.diaoyun.zion.master.util.TranslateHelper
;
import
com.diaoyun.zion.master.util.spider.SpiderUtil
;
import
net.sf.json.JSONObject
;
import
net.sf.json.JSONObject
;
import
org.slf4j.Logger
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
import
org.slf4j.LoggerFactory
;
...
@@ -33,7 +33,7 @@ public class PullandbearSpider implements IItemSpider {
...
@@ -33,7 +33,7 @@ public class PullandbearSpider implements IItemSpider {
/**
/**
* PullAndBear 数据爬虫
* PullAndBear 数据爬虫
* @see
PullAndBearSpiderParse#format
ProductResponse 格式化数据方法
* @see
SpiderUtil#formatPullAndBear
ProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
* @return 格式化与翻译后的 Json 数据
*/
*/
...
@@ -43,13 +43,25 @@ public class PullandbearSpider implements IItemSpider {
...
@@ -43,13 +43,25 @@ public class PullandbearSpider implements IItemSpider {
targetUrl
=
PULL_AND_BEAR_URL
+
pId
+
"/detail?languageId=-7&appId=1"
;
targetUrl
=
PULL_AND_BEAR_URL
+
pId
+
"/detail?languageId=-7&appId=1"
;
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
PULLANDBEAR
.
getValue
());
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
PULLANDBEAR
.
getValue
());
JSONObject
resultJson
=
JSONObject
.
fromObject
(
content
);
JSONObject
resultJson
=
JSONObject
.
fromObject
(
content
);
ProductResponse
productResponse
=
PullAndBearSpiderParse
.
format
ProductResponse
(
resultJson
,
pId
);
ProductResponse
productResponse
=
SpiderUtil
.
formatPullAndBear
ProductResponse
(
resultJson
,
pId
);
resultJson
=
JSONObject
.
fromObject
(
productResponse
);
resultJson
=
JSONObject
.
fromObject
(
productResponse
);
// 翻译
// 翻译
TranslateHelper
.
translateProductResponse
(
resultJson
);
TranslateHelper
.
translateProductResponse
(
resultJson
);
return
resultJson
;
return
resultJson
;
}
}
// /**
// * PullAndBear 获取商品详情数据的方式
// * @param args
// * @throws Exception
// */
// public static void main(String[] args) throws Exception {
// String targetUrl = "https://www.pullandbear.cn/cn/%25E7%2594%25B7%25E8%25A3%2585/%25E6%259C%258D%25E8%25A3%2585/%25E5%25A4%25A7%25E8%25A1%25A3%25E5%2592%258C%25E5%25A4%25B9%25E5%2585%258B/cazadora-tipo-plumas-costuras-invisibles-c-capucha-c1030204837p501658014.html?cS=800";
// String pId = targetUrl.substring(targetUrl.lastIndexOf("p")+1, targetUrl.lastIndexOf(".html"));
// targetUrl = PULL_AND_BEAR_URL + pId + "/detail?languageId=-7&appId=1";
// String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.PULLANDBEAR.getValue());
// System.err.println(content);
// }
}
}
...
...
src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java
浏览文件 @
9c9bfa0d
...
@@ -18,7 +18,7 @@ import java.util.concurrent.TimeoutException;
...
@@ -18,7 +18,7 @@ import java.util.concurrent.TimeoutException;
/**
/**
* 优衣库数据爬虫
* 优衣库数据爬虫
*
*
TODO 读取不到链接
* @author 爱酱油不爱醋
* @author 爱酱油不爱醋
*/
*/
@Component
(
"uniqloSpider"
)
@Component
(
"uniqloSpider"
)
...
...
src/main/java/com/diaoyun/zion/chinafrica/bis/impl/VansSpider.java
0 → 100644
浏览文件 @
9c9bfa0d
package
com
.
diaoyun
.
zion
.
chinafrica
.
bis
.
impl
;
import
com.diaoyun.zion.chinafrica.bis.IItemSpider
;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.vo.ProductResponse
;
import
com.diaoyun.zion.master.util.HttpClientUtil
;
import
com.diaoyun.zion.master.util.TranslateHelper
;
import
com.diaoyun.zion.master.util.spider.SpiderUtil
;
import
com.diaoyun.zion.master.util.spider.VansSpiderParse
;
import
net.sf.json.JSONObject
;
import
org.jsoup.Jsoup
;
import
org.jsoup.nodes.Document
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
import
org.springframework.stereotype.Component
;
import
java.io.IOException
;
import
java.net.URISyntaxException
;
import
java.util.concurrent.ExecutionException
;
import
java.util.concurrent.TimeoutException
;
/**
* Vans(范斯) 数据爬虫
*
* @author 爱酱油不爱醋
*/
@Component
(
"vansSpider"
)
public
class
VansSpider
implements
IItemSpider
{
private
static
Logger
logger
=
LoggerFactory
.
getLogger
(
ZaraSpider
.
class
);
/**
* Vans 数据爬虫
* @see VansSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
*/
@Override
public
JSONObject
captureItem
(
String
targetUrl
)
throws
URISyntaxException
,
IOException
,
ExecutionException
,
InterruptedException
,
TimeoutException
{
String
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
VANS
.
getValue
());
Document
document
=
Jsoup
.
parse
(
content
);
String
pTitle
=
document
.
select
(
"product-titles"
).
text
();
String
[]
spilt
=
targetUrl
.
split
(
"/"
);
String
pId
=
SpiderUtil
.
retainNumber
(
spilt
[
4
]);
targetUrl
=
"https://"
+
spilt
[
2
]
+
"/wap/product-ajax_product_spec-"
+
pId
+
".html"
;
content
=
HttpClientUtil
.
getContentByUrl
(
targetUrl
,
PlatformEnum
.
VANS
.
getValue
());
ProductResponse
productResponse
=
VansSpiderParse
.
formatProductResponse
(
content
,
pId
,
pTitle
);
JSONObject
resultObj
=
JSONObject
.
fromObject
(
productResponse
);
TranslateHelper
.
translateProductResponse
(
resultObj
);
return
resultObj
;
}
}
src/main/java/com/diaoyun/zion/chinafrica/service/impl/SpiderServiceImpl.java
浏览文件 @
9c9bfa0d
...
@@ -47,7 +47,7 @@ public class SpiderServiceImpl implements SpiderService {
...
@@ -47,7 +47,7 @@ public class SpiderServiceImpl implements SpiderService {
platformEnum
=
PlatformEnum
.
TB
;
platformEnum
=
PlatformEnum
.
TB
;
}
else
if
(
targetUrl
.
contains
(
"tmall.com/item.htm"
))
{
}
else
if
(
targetUrl
.
contains
(
"tmall.com/item.htm"
))
{
platformEnum
=
PlatformEnum
.
TM
;
platformEnum
=
PlatformEnum
.
TM
;
}
else
if
(
targetUrl
.
contains
(
"pullandbear.cn/
cn/
"
))
{
}
else
if
(
targetUrl
.
contains
(
"pullandbear.cn/"
))
{
platformEnum
=
PlatformEnum
.
PULLANDBEAR
;
platformEnum
=
PlatformEnum
.
PULLANDBEAR
;
}
else
if
(
targetUrl
.
contains
(
"www.gap.cn/pdp/"
))
{
}
else
if
(
targetUrl
.
contains
(
"www.gap.cn/pdp/"
))
{
platformEnum
=
PlatformEnum
.
GAP
;
platformEnum
=
PlatformEnum
.
GAP
;
...
@@ -57,34 +57,33 @@ public class SpiderServiceImpl implements SpiderService {
...
@@ -57,34 +57,33 @@ public class SpiderServiceImpl implements SpiderService {
platformEnum
=
PlatformEnum
.
AfriEshop
;
platformEnum
=
PlatformEnum
.
AfriEshop
;
}
else
if
(
targetUrl
.
contains
(
"zara.cn"
))
{
}
else
if
(
targetUrl
.
contains
(
"zara.cn"
))
{
platformEnum
=
PlatformEnum
.
ZARA
;
platformEnum
=
PlatformEnum
.
ZARA
;
}
else
if
(
targetUrl
.
contains
(
"uniqlo"
)
&&
targetUrl
.
contains
(
"#/product?pid"
))
{
}
else
if
(
targetUrl
.
contains
(
"uniqlo
.cn/
"
)
&&
targetUrl
.
contains
(
"#/product?pid"
))
{
platformEnum
=
PlatformEnum
.
UNIQLO
;
platformEnum
=
PlatformEnum
.
UNIQLO
;
}
else
if
(
targetUrl
.
contains
(
"hm.com/"
)
&&
targetUrl
.
contains
(
"productpage"
))
{
}
else
if
(
targetUrl
.
contains
(
"hm.com/
m
"
)
&&
targetUrl
.
contains
(
"productpage"
))
{
platformEnum
=
PlatformEnum
.
HM
;
platformEnum
=
PlatformEnum
.
HM
;
}
else
if
(
targetUrl
.
contains
(
"
https://www.adidas.com.cn/
item"
))
{
}
else
if
(
targetUrl
.
contains
(
"
adidas.com"
)
&&
targetUrl
.
contains
(
"
item"
))
{
platformEnum
=
PlatformEnum
.
ADIDAS
;
platformEnum
=
PlatformEnum
.
ADIDAS
;
}
else
if
(
targetUrl
.
contains
(
"http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore"
))
{
}
else
if
(
targetUrl
.
contains
(
"http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore"
))
{
platformEnum
=
PlatformEnum
.
LILY
;
platformEnum
=
PlatformEnum
.
LILY
;
}
else
if
(
targetUrl
.
contains
(
"
http://wap.ur.com.cn/product/detail
"
))
{
}
else
if
(
targetUrl
.
contains
(
"
wap.ur"
)
&&
targetUrl
.
contains
(
"product
"
))
{
platformEnum
=
PlatformEnum
.
URBANREVIVO
;
platformEnum
=
PlatformEnum
.
URBANREVIVO
;
}
else
if
(
targetUrl
.
contains
(
"underarmour
.cn/p
"
))
{
}
else
if
(
targetUrl
.
contains
(
"underarmour"
))
{
platformEnum
=
PlatformEnum
.
UNDERARMOUR
;
platformEnum
=
PlatformEnum
.
UNDERARMOUR
;
}
else
if
(
targetUrl
.
contains
(
"ochirly.com
.cn/
p/mobile/"
))
{
}
else
if
(
targetUrl
.
contains
(
"ochirly.com
"
)
&&
targetUrl
.
contains
(
"
p/mobile/"
))
{
platformEnum
=
PlatformEnum
.
OCHIRLY
;
platformEnum
=
PlatformEnum
.
OCHIRLY
;
}
else
if
(
targetUrl
.
contains
(
"esprit.cn/product/"
))
{
}
else
if
(
targetUrl
.
contains
(
"esprit.cn/product/"
)
&&
targetUrl
.
contains
(
"styleNo"
)
&&
targetUrl
.
contains
(
"skucode"
)
)
{
platformEnum
=
PlatformEnum
.
ESPRIT
;
platformEnum
=
PlatformEnum
.
ESPRIT
;
}
else
if
(
targetUrl
.
contains
(
"levi.com
.cn/product/
"
))
{
}
else
if
(
targetUrl
.
contains
(
"levi.com
"
)
&&
targetUrl
.
contains
(
"product
"
))
{
platformEnum
=
PlatformEnum
.
LEVI
;
platformEnum
=
PlatformEnum
.
LEVI
;
}
else
if
(
targetUrl
.
contains
(
"moco.com/moco/
zh/p/
"
))
{
}
else
if
(
targetUrl
.
contains
(
"moco.com/moco/"
))
{
platformEnum
=
PlatformEnum
.
MOCO
;
platformEnum
=
PlatformEnum
.
MOCO
;
}
else
if
(
targetUrl
.
contains
(
"massimodutti
.cn
"
)
&&
targetUrl
.
contains
(
"colorId"
)
&&
targetUrl
.
contains
(
"categoryId"
))
{
}
else
if
(
targetUrl
.
contains
(
"massimodutti"
)
&&
targetUrl
.
contains
(
"colorId"
)
&&
targetUrl
.
contains
(
"categoryId"
))
{
platformEnum
=
PlatformEnum
.
MASSIMODUTTI
;
platformEnum
=
PlatformEnum
.
MASSIMODUTTI
;
}
else
if
(
targetUrl
.
contains
(
"coach
.com/coach
"
))
{
}
else
if
(
targetUrl
.
contains
(
"coach"
))
{
platformEnum
=
PlatformEnum
.
COACH
;
platformEnum
=
PlatformEnum
.
COACH
;
}
else
if
(
targetUrl
.
contains
(
"vans.com"
)
&&
targetUrl
.
contains
(
"wap/product"
))
{
}
else
if
(
targetUrl
.
contains
(
"vans.com"
)
&&
targetUrl
.
contains
(
"wap/product"
))
{
platformEnum
=
PlatformEnum
.
VANS
;
platformEnum
=
PlatformEnum
.
VANS
;
}
}
return
platformEnum
;
return
platformEnum
;
}
}
}
}
src/main/java/com/diaoyun/zion/master/util/spider/HMSpiderParse.java
浏览文件 @
9c9bfa0d
...
@@ -17,7 +17,7 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
...
@@ -17,7 +17,7 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
/**
/**
* H&M 爬虫数据解析
* H&M 爬虫数据解析
*
*
@see com.diaoyun.zion.chinafrica.bis.impl.HmSpider
* @author 爱酱油不爱醋
* @author 爱酱油不爱醋
*/
*/
public
class
HMSpiderParse
{
public
class
HMSpiderParse
{
...
@@ -29,10 +29,12 @@ public class HMSpiderParse {
...
@@ -29,10 +29,12 @@ public class HMSpiderParse {
*/
*/
public
static
ProductResponse
formatProductResponse
(
String
content
)
{
public
static
ProductResponse
formatProductResponse
(
String
content
)
{
// targetUrl=https://m2.hm.com/m/zh_cn/productpage.0806412004.html
// 获取主要数据并将转换 Json 数据及 Document 对象
// 获取主要数据并将转换 Json 数据及 Document 对象
String
detailStr
=
JsoupUtil
.
getScriptContent
(
content
,
"productArticleDetails"
);
String
detailStr
=
JsoupUtil
.
getScriptContent
(
content
,
"productArticleDetails"
);
int
firstBrackets
=
detailStr
.
indexOf
(
"{"
);
int
firstBrackets
=
detailStr
.
indexOf
(
"{"
);
int
lastbrackets
=
detailStr
.
lastIndexOf
(
"}"
);
int
lastbrackets
=
detailStr
.
lastIndexOf
(
"}"
);
String
resultStr
=
detailStr
.
substring
(
firstBrackets
,
lastbrackets
+
1
);
String
resultStr
=
detailStr
.
substring
(
firstBrackets
,
lastbrackets
+
1
);
resultStr
=
resultStr
.
replaceAll
(
"\'"
,
"\""
)
resultStr
=
resultStr
.
replaceAll
(
"\'"
,
"\""
)
.
replaceAll
(
"\"image\": isDesktop [?] "
,
""
)
.
replaceAll
(
"\"image\": isDesktop [?] "
,
""
)
...
...
src/main/java/com/diaoyun/zion/master/util/spider/PullAndBearSpiderParse.java
deleted
100644 → 0
浏览文件 @
e91ff555
package
com
.
diaoyun
.
zion
.
master
.
util
.
spider
;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.vo.*
;
import
net.sf.json.JSONArray
;
import
net.sf.json.JSONObject
;
import
java.math.BigDecimal
;
import
java.util.*
;
/**
* PullAndBear 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.PullandbearSpider 数据爬虫
* @author 爱酱油不爱醋
*/
public
class
PullAndBearSpiderParse
{
/**
* 格式化 PullAndBear 返回数据
* @param dataMap 主要的 json 数据
* @param pId 商品链接的 id
* @return 格式化后的数据
*/
public
static
ProductResponse
formatProductResponse
(
JSONObject
dataMap
,
String
pId
)
{
// 声明封装类
ProductResponse
productResponse
=
new
ProductResponse
();
// 属性:Zara 的商品属性有颜色、尺码
Map
<
String
,
Set
<
ProductProp
>>
productPropSet
=
new
HashMap
<>(
16
);
// 原始价
List
<
OriginalPrice
>
originalPriceList
=
new
ArrayList
<>();
// 促销价格
List
<
ProductPromotion
>
promotionList
=
new
ArrayList
<>();
// 库存
DynStock
dynStock
=
new
DynStock
();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock
.
setSellableQuantity
(
9999
);
// 商品基本信息
ItemInfo
itemInfo
=
new
ItemInfo
();
// 取 bundleProductSummaries 的节点对象
JSONObject
bundleProductSummariesObj
=
dataMap
.
getJSONArray
(
"bundleProductSummaries"
).
getJSONObject
(
0
);
//////////////////////////////////// 获取商品基本信息 ////////////////////////////////////////////
itemInfo
.
setShopName
(
PlatformEnum
.
PULLANDBEAR
.
getLabel
());
itemInfo
.
setShopUrl
(
"https://www.pullandbear.cn/cn/"
);
itemInfo
.
setItemId
(
pId
);
itemInfo
.
setTitle
(
bundleProductSummariesObj
.
getString
(
"name"
));
//////////////////////////////////// 获取商品基本信息End(图片下取) ////////////////////////////////////////////
// 取 colors 数组节点
JSONArray
colorsArr
=
bundleProductSummariesObj
.
getJSONObject
(
"detail"
).
getJSONArray
(
"colors"
);
for
(
int
i
=
0
;
i
<
colorsArr
.
size
();
i
++)
{
JSONObject
colorsObj
=
colorsArr
.
getJSONObject
(
i
);
//////////////////////////////////// 获取商品颜色与图片属性 ////////////////////////////////////////////
Set
<
ProductProp
>
propSetColor
=
new
HashSet
<>(
16
);
ProductProp
productPropColor
=
new
ProductProp
();
// 颜色id
String
colorNo
=
colorsObj
.
getString
(
"id"
);
productPropColor
.
setPropId
(
colorNo
);
// 颜色名
String
colorName
=
colorsObj
.
getString
(
"name"
);
productPropColor
.
setPropName
(
colorName
);
// 取 image 对象节点
JSONObject
imageObj
=
colorsObj
.
getJSONObject
(
"image"
);
// 颜色图片
String
imageUrl
=
"https://static.pullandbear.cn/2/photos/"
+
imageObj
.
getString
(
"url"
)
+
"_2_1_8.jpg?t="
+
imageObj
.
getString
(
"timestamp"
);
productPropColor
.
setImage
(
imageUrl
);
if
(
i
==
0
)
{
itemInfo
.
setPic
(
imageUrl
);
}
propSetColor
.
add
(
productPropColor
);
if
(
productPropSet
.
get
(
"颜色"
)
==
null
)
{
productPropSet
.
put
(
"颜色"
,
propSetColor
);
}
else
{
Set
<
ProductProp
>
oldPropSet
=
productPropSet
.
get
(
"颜色"
);
propSetColor
.
addAll
(
oldPropSet
);
productPropSet
.
put
(
"颜色"
,
propSetColor
);
}
//////////////////////////////////// 获取商品颜色与图片属性 END ////////////////////////////////////////////
// 取 siezes 对象数组
JSONArray
sizesArr
=
colorsObj
.
getJSONArray
(
"sizes"
);
for
(
int
j
=
0
;
j
<
sizesArr
.
size
();
j
++)
{
JSONObject
sizesObj
=
sizesArr
.
getJSONObject
(
j
);
///////////////////////// 获取商品尺码属性 ////////////////////
Set
<
ProductProp
>
sizePropSetSize
=
new
HashSet
<>();
ProductProp
productPropSize
=
new
ProductProp
();
String
size
=
sizesObj
.
getString
(
"name"
);
productPropSize
.
setPropName
(
size
);
String
sizeNo
=
sizesObj
.
getString
(
"sku"
);
productPropSize
.
setPropId
(
sizeNo
);
sizePropSetSize
.
add
(
productPropSize
);
if
(
productPropSet
.
get
(
"尺码"
)
==
null
)
{
productPropSet
.
put
(
"尺码"
,
sizePropSetSize
);
}
else
{
Set
<
ProductProp
>
oldPropSet
=
productPropSet
.
get
(
"尺码"
);
sizePropSetSize
.
addAll
(
oldPropSet
);
productPropSet
.
put
(
"尺码"
,
sizePropSetSize
);
}
///////////////////////// 获取商品尺码属性 END////////////////////
// 商品的库存id
String
skuStr
=
";"
+
colorNo
+
";"
+
sizeNo
+
";"
;
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse
.
setStockFlag
(
true
);
List
<
ProductSkuStock
>
productSkuStockList
=
dynStock
.
getProductSkuStockList
();
if
(
productSkuStockList
==
null
)
{
productSkuStockList
=
new
ArrayList
<>();
}
ProductSkuStock
productSkuStock
=
new
ProductSkuStock
();
// 设置:可用库存值,PullAndBear 未有可用的库存数据
productSkuStock
.
setSellableQuantity
(
999
);
// 设置:库存对应的id
productSkuStock
.
setSkuStr
(
skuStr
);
productSkuStockList
.
add
(
productSkuStock
);
dynStock
.
setProductSkuStockList
(
productSkuStockList
);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice
originalPrice
=
new
OriginalPrice
();
// 获取商品的原始价
String
fullPrice
=
sizesObj
.
getString
(
"price"
);
BigDecimal
priceOld
=
new
BigDecimal
(
fullPrice
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
// TODO 转换汇率,目前商品单位是人民币
fullPrice
=
SpiderUtil
.
exchangeRate
(
priceNew
.
toString
());
originalPrice
.
setPrice
(
fullPrice
);
productResponse
.
setPrice
(
fullPrice
);
productResponse
.
setSalePrice
(
fullPrice
+
"-"
+
fullPrice
);
originalPrice
.
setSkuStr
(
skuStr
);
originalPriceList
.
add
(
originalPrice
);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
}
// 按照一下顺序进行 json 数据的填充
productResponse
.
setPropFlag
(
true
);
productResponse
.
setProductPropSet
(
productPropSet
);
productResponse
.
setPlatform
(
PlatformEnum
.
PULLANDBEAR
.
getValue
());
productResponse
.
setPromotionList
(
promotionList
);
productResponse
.
setOriginalPriceList
(
originalPriceList
);
productResponse
.
setItemInfo
(
itemInfo
);
productResponse
.
setDynStock
(
dynStock
);
return
productResponse
;
}
}
src/main/java/com/diaoyun/zion/master/util/spider/SpiderUtil.java
浏览文件 @
9c9bfa0d
...
@@ -27,6 +27,7 @@ public class SpiderUtil {
...
@@ -27,6 +27,7 @@ public class SpiderUtil {
/**
/**
* 转换汇率
* 转换汇率
*
* @param fullPrice
* @param fullPrice
* @return
* @return
*/
*/
...
@@ -40,6 +41,7 @@ public class SpiderUtil {
...
@@ -40,6 +41,7 @@ public class SpiderUtil {
/**
/**
* 去除除了数字之外的所有字符
* 去除除了数字之外的所有字符
*
* @param str 字符串
* @param str 字符串
* @return 只有数字的字符串
* @return 只有数字的字符串
*/
*/
...
@@ -81,16 +83,16 @@ public class SpiderUtil {
...
@@ -81,16 +83,16 @@ public class SpiderUtil {
OriginalPrice
originalPrice
=
new
OriginalPrice
();
OriginalPrice
originalPrice
=
new
OriginalPrice
();
originalPrice
.
setSkuStr
(
skuStr
);
originalPrice
.
setSkuStr
(
skuStr
);
String
listPrice
=
skuValue
.
getString
(
"listPrice"
);
String
listPrice
=
skuValue
.
getString
(
"listPrice"
);
//转换汇率
//转换汇率
listPrice
=
exchangeRate
(
listPrice
);
listPrice
=
exchangeRate
(
listPrice
);
originalPrice
.
setPrice
(
listPrice
);
originalPrice
.
setPrice
(
listPrice
);
originalPriceList
.
add
(
originalPrice
);
originalPriceList
.
add
(
originalPrice
);
//促销价格
//促销价格
if
(
StringUtils
.
isNotBlank
(
skuValue
.
getString
(
"salePrice"
)))
{
if
(
StringUtils
.
isNotBlank
(
skuValue
.
getString
(
"salePrice"
)))
{
String
salePrice
=
skuValue
.
getString
(
"salePrice"
);
String
salePrice
=
skuValue
.
getString
(
"salePrice"
);
//转换汇率
//转换汇率
salePrice
=
exchangeRate
(
salePrice
);
salePrice
=
exchangeRate
(
salePrice
);
productResponse
.
setPromotionFlag
(
true
);
productResponse
.
setPromotionFlag
(
true
);
ProductPromotion
productPromotion
=
new
ProductPromotion
();
ProductPromotion
productPromotion
=
new
ProductPromotion
();
productPromotion
.
setSkuStr
(
skuStr
);
productPromotion
.
setSkuStr
(
skuStr
);
...
@@ -146,16 +148,16 @@ public class SpiderUtil {
...
@@ -146,16 +148,16 @@ public class SpiderUtil {
}
}
String
minPrice
=
dataMap
.
getString
(
"minPrice"
);
String
minPrice
=
dataMap
.
getString
(
"minPrice"
);
String
maxPrice
=
dataMap
.
getString
(
"maxPrice"
);
String
maxPrice
=
dataMap
.
getString
(
"maxPrice"
);
//转换汇率
//转换汇率
minPrice
=
exchangeRate
(
minPrice
);
minPrice
=
exchangeRate
(
minPrice
);
maxPrice
=
exchangeRate
(
maxPrice
);
maxPrice
=
exchangeRate
(
maxPrice
);
//一口价
//一口价
productResponse
.
setPrice
(
minPrice
+
"-"
+
maxPrice
);
productResponse
.
setPrice
(
minPrice
+
"-"
+
maxPrice
);
//一口价
//一口价
productResponse
.
setSalePrice
(
minPrice
+
"-"
+
maxPrice
);
productResponse
.
setSalePrice
(
minPrice
+
"-"
+
maxPrice
);
//没有库存信息 需要另外获取
//没有库存信息 需要另外获取
productResponse
.
setStockFlag
(
false
);
productResponse
.
setStockFlag
(
false
);
//有商品属性
//有商品属性
...
@@ -202,10 +204,10 @@ public class SpiderUtil {
...
@@ -202,10 +204,10 @@ public class SpiderUtil {
////////////////////////////////////获取价格和商品属性////////////////////////////////////////////
////////////////////////////////////获取价格和商品属性////////////////////////////////////////////
String
fullPrice
=
itemDetail
.
getString
(
"fullPrice"
);
String
fullPrice
=
itemDetail
.
getString
(
"fullPrice"
);
//转换汇率
//转换汇率
fullPrice
=
exchangeRate
(
fullPrice
);
fullPrice
=
exchangeRate
(
fullPrice
);
String
currentPrice
=
itemDetail
.
getString
(
"currentPrice"
);
String
currentPrice
=
itemDetail
.
getString
(
"currentPrice"
);
//转换汇率
//转换汇率
currentPrice
=
exchangeRate
(
currentPrice
);
currentPrice
=
exchangeRate
(
currentPrice
);
productResponse
.
setPrice
(
fullPrice
);
productResponse
.
setPrice
(
fullPrice
);
JSONArray
skusArr
=
itemDetail
.
getJSONArray
(
"skus"
);
JSONArray
skusArr
=
itemDetail
.
getJSONArray
(
"skus"
);
//获取商品尺码属性,同时记录下skuid和尺码关系
//获取商品尺码属性,同时记录下skuid和尺码关系
...
@@ -347,7 +349,7 @@ public class SpiderUtil {
...
@@ -347,7 +349,7 @@ public class SpiderUtil {
//属性
//属性
JSONArray
itemOptionsArray
=
variantsArray
.
getJSONObject
(
i
).
getJSONArray
(
"options"
);
JSONArray
itemOptionsArray
=
variantsArray
.
getJSONObject
(
i
).
getJSONArray
(
"options"
);
//没有属性的时候,会返回 Default Title
//没有属性的时候,会返回 Default Title
if
(
"Default Title"
.
equalsIgnoreCase
(
itemOptionsArray
.
getString
(
0
)))
{
if
(
"Default Title"
.
equalsIgnoreCase
(
itemOptionsArray
.
getString
(
0
)))
{
break
;
break
;
}
}
String
skuStr
=
";"
;
String
skuStr
=
";"
;
...
@@ -357,7 +359,7 @@ public class SpiderUtil {
...
@@ -357,7 +359,7 @@ public class SpiderUtil {
///////////////////原始价////////////////////////////////////
///////////////////原始价////////////////////////////////////
OriginalPrice
originalPrice
=
new
OriginalPrice
();
OriginalPrice
originalPrice
=
new
OriginalPrice
();
String
price
=
variantsArray
.
getJSONObject
(
i
).
getString
(
"price"
);
String
price
=
variantsArray
.
getJSONObject
(
i
).
getString
(
"price"
);
BigDecimal
priceOld
=
new
BigDecimal
(
price
);
BigDecimal
priceOld
=
new
BigDecimal
(
price
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
originalPrice
.
setPrice
(
priceNew
.
toString
());
originalPrice
.
setPrice
(
priceNew
.
toString
());
...
@@ -411,23 +413,25 @@ public class SpiderUtil {
...
@@ -411,23 +413,25 @@ public class SpiderUtil {
productResponse
.
setItemInfo
(
itemInfo
);
productResponse
.
setItemInfo
(
itemInfo
);
productResponse
.
setDynStock
(
dynStock
);
productResponse
.
setDynStock
(
dynStock
);
String
price
=
resultObj
.
getString
(
"price"
);
String
price
=
resultObj
.
getString
(
"price"
);
BigDecimal
priceOld
=
new
BigDecimal
(
price
);
BigDecimal
priceOld
=
new
BigDecimal
(
price
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
productResponse
.
setPrice
(
priceNew
.
toString
());
productResponse
.
setPrice
(
priceNew
.
toString
());
return
productResponse
;
return
productResponse
;
}
}
/**
/**
* 格式化
H&M
返回数据
* 格式化
PullAndBear
返回数据
*
@see com.diaoyun.zion.chinafrica.bis.impl.HmSpider
*
* @param dataMap 主要的 json 数据
* @param dataMap 主要的 json 数据
* @param pId 商品链接的 id
* @return 格式化后的数据
* @return 格式化后的数据
* @see com.diaoyun.zion.chinafrica.bis.impl.PullandbearSpider
*/
*/
public
static
ProductResponse
format
HMProductResponse
(
JSONObject
dataMap
)
{
public
static
ProductResponse
format
PullAndBearProductResponse
(
JSONObject
dataMap
,
String
pId
)
{
// 声明封装类
// 声明封装类
ProductResponse
productResponse
=
new
ProductResponse
();
ProductResponse
productResponse
=
new
ProductResponse
();
// 属性:
Zara
的商品属性有颜色、尺码
// 属性:的商品属性有颜色、尺码
Map
<
String
,
Set
<
ProductProp
>>
productPropSet
=
new
HashMap
<>(
16
);
Map
<
String
,
Set
<
ProductProp
>>
productPropSet
=
new
HashMap
<>(
16
);
// 原始价
// 原始价
List
<
OriginalPrice
>
originalPriceList
=
new
ArrayList
<>();
List
<
OriginalPrice
>
originalPriceList
=
new
ArrayList
<>();
...
@@ -440,6 +444,117 @@ public class SpiderUtil {
...
@@ -440,6 +444,117 @@ public class SpiderUtil {
// 商品基本信息
// 商品基本信息
ItemInfo
itemInfo
=
new
ItemInfo
();
ItemInfo
itemInfo
=
new
ItemInfo
();
// 取 bundleProductSummaries 的节点对象
JSONObject
bundleProductSummariesObj
=
dataMap
.
getJSONArray
(
"bundleProductSummaries"
).
getJSONObject
(
0
);
//////////////////////////////////// 获取商品基本信息 ////////////////////////////////////////////
itemInfo
.
setShopName
(
"PullAndBear"
);
itemInfo
.
setShopUrl
(
"https://www.pullandbear.cn/"
);
itemInfo
.
setItemId
(
pId
);
itemInfo
.
setTitle
(
bundleProductSummariesObj
.
getString
(
"name"
));
//////////////////////////////////// 获取商品基本信息End(图片下取) ////////////////////////////////////////////
// 取 colors 数组节点
JSONArray
colorsArr
=
bundleProductSummariesObj
.
getJSONObject
(
"detail"
).
getJSONArray
(
"colors"
);
Set
<
ProductProp
>
propSetColor
=
new
HashSet
<>(
16
);
Set
<
ProductProp
>
sizePropSetSize
=
new
HashSet
<>(
16
);
List
<
ProductSkuStock
>
productSkuStockList
=
dynStock
.
getProductSkuStockList
();
productResponse
.
setStockFlag
(
true
);
for
(
int
i
=
0
;
i
<
colorsArr
.
size
();
i
++)
{
JSONObject
colorsObj
=
colorsArr
.
getJSONObject
(
i
);
//////////////////////////////////// 获取商品颜色与图片属性 ////////////////////////////////////////////
ProductProp
productPropColor
=
new
ProductProp
();
JSONObject
imageObj
=
colorsObj
.
getJSONObject
(
"image"
);
String
imageUrl
=
"https://static.pullandbear.cn/2/photos/"
+
imageObj
.
getString
(
"url"
)
+
"_2_1_8.jpg?t="
+
imageObj
.
getString
(
"timestamp"
);
String
colorNo
=
colorsObj
.
getString
(
"id"
);
String
color
=
colorsObj
.
getString
(
"name"
);
productPropColor
.
setPropId
(
colorNo
);
productPropColor
.
setPropName
(
color
);
productPropColor
.
setImage
(
imageUrl
);
if
(
i
==
0
)
{
itemInfo
.
setPic
(
imageUrl
);
}
propSetColor
.
add
(
productPropColor
);
if
(
productPropSet
.
get
(
"颜色"
)
==
null
)
{
productPropSet
.
put
(
"颜色"
,
propSetColor
);
}
else
{
Set
<
ProductProp
>
oldPropSet
=
productPropSet
.
get
(
"颜色"
);
propSetColor
.
addAll
(
oldPropSet
);
productPropSet
.
put
(
"颜色"
,
propSetColor
);
}
//////////////////////////////////// 获取商品颜色与图片属性 END ////////////////////////////////////////////
// 取 siezes 对象数组
JSONArray
sizesArr
=
colorsObj
.
getJSONArray
(
"sizes"
);
for
(
int
j
=
0
;
j
<
sizesArr
.
size
();
j
++)
{
JSONObject
sizesObj
=
sizesArr
.
getJSONObject
(
j
);
///////////////////////// 获取商品尺码属性 ////////////////////
ProductProp
productPropSize
=
new
ProductProp
();
String
sizeNo
=
sizesObj
.
getString
(
"sku"
);
String
size
=
sizesObj
.
getString
(
"name"
);
productPropSize
.
setPropName
(
size
);
productPropSize
.
setPropId
(
sizeNo
);
sizePropSetSize
.
add
(
productPropSize
);
if
(
productPropSet
.
get
(
"尺码"
)
==
null
)
{
productPropSet
.
put
(
"尺码"
,
sizePropSetSize
);
}
else
{
Set
<
ProductProp
>
oldPropSet
=
productPropSet
.
get
(
"尺码"
);
sizePropSetSize
.
addAll
(
oldPropSet
);
productPropSet
.
put
(
"尺码"
,
sizePropSetSize
);
}
///////////////////////// 获取商品尺码属性 END////////////////////
// 商品的库存id
String
skuStr
=
";"
+
colorNo
+
";"
+
sizeNo
+
";"
;
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
ProductSkuStock
productSkuStock
=
new
ProductSkuStock
();
productSkuStock
.
setSkuStr
(
skuStr
);
productSkuStock
.
setSellableQuantity
(
999
);
if
(
productSkuStockList
==
null
)
{
productSkuStockList
=
new
ArrayList
<>();
}
productSkuStockList
.
add
(
productSkuStock
);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice
originalPrice
=
new
OriginalPrice
();
// 获取商品的原始价
String
fullPrice
=
sizesObj
.
getString
(
"price"
);
BigDecimal
priceOld
=
new
BigDecimal
(
fullPrice
);
BigDecimal
div
=
new
BigDecimal
(
"100"
);
BigDecimal
priceNew
=
priceOld
.
divide
(
div
,
2
,
BigDecimal
.
ROUND_DOWN
);
// TODO 转换汇率,目前商品单位是人民币
fullPrice
=
SpiderUtil
.
exchangeRate
(
priceNew
.
toString
());
originalPrice
.
setSkuStr
(
skuStr
);
originalPrice
.
setPrice
(
fullPrice
);
originalPriceList
.
add
(
originalPrice
);
productResponse
.
setPrice
(
fullPrice
);
productResponse
.
setSalePrice
(
fullPrice
+
"-"
+
fullPrice
);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
dynStock
.
setProductSkuStockList
(
productSkuStockList
);
}
// 按照一下顺序进行 json 数据的填充
// 按照一下顺序进行 json 数据的填充
productResponse
.
setPropFlag
(
true
);
productResponse
.
setPropFlag
(
true
);
productResponse
.
setProductPropSet
(
productPropSet
);
productResponse
.
setProductPropSet
(
productPropSet
);
...
@@ -450,5 +565,7 @@ public class SpiderUtil {
...
@@ -450,5 +565,7 @@ public class SpiderUtil {
productResponse
.
setDynStock
(
dynStock
);
productResponse
.
setDynStock
(
dynStock
);
return
productResponse
;
return
productResponse
;
}
}
}
}
src/main/java/com/diaoyun/zion/master/util/spider/VansSpiderParse.java
0 → 100644
浏览文件 @
9c9bfa0d
package
com
.
diaoyun
.
zion
.
master
.
util
.
spider
;
import
com.diaoyun.zion.chinafrica.enums.PlatformEnum
;
import
com.diaoyun.zion.chinafrica.vo.*
;
import
org.jsoup.Jsoup
;
import
org.jsoup.nodes.Document
;
import
org.jsoup.select.Elements
;
import
java.util.*
;
import
java.util.regex.Pattern
;
/**
* Vans(范斯) 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.VansSpider
* @author 爱酱油不爱醋
*/
public
class
VansSpiderParse
{
/**
* 格式化返回数据
* @param content 主要的页面数据
* @return 格式化后的数据
*/
public
static
ProductResponse
formatProductResponse
(
String
content
,
String
pId
,
String
pTitle
)
{
// 声明封装类
ProductResponse
productResponse
=
new
ProductResponse
();
// 属性:Zara 的商品属性有颜色、尺码
Map
<
String
,
Set
<
ProductProp
>>
productPropSet
=
new
HashMap
<>(
16
);
// 原始价
List
<
OriginalPrice
>
originalPriceList
=
new
ArrayList
<>();
// 促销价格
List
<
ProductPromotion
>
promotionList
=
new
ArrayList
<>();
// 库存
DynStock
dynStock
=
new
DynStock
();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock
.
setSellableQuantity
(
9999
);
// 解析成 Document 对象
Document
document
=
Jsoup
.
parse
(
content
);
Elements
colorUrlEle
=
document
.
select
(
"div[class=pro-color]"
).
select
(
"a"
);
// 价格
String
fullPrice
=
Pattern
.
compile
(
"[^0-9]"
).
matcher
(
document
.
select
(
"span[id=spec_price]"
).
text
()).
replaceAll
(
""
).
trim
();
// 颜色Id
List
<
String
>
colorNoList
=
colorUrlEle
.
eachAttr
(
"data-product-id"
);
// 颜色名称
List
<
String
>
colorList
=
colorUrlEle
.
eachText
();
// 颜色图片
List
<
String
>
imageList
=
colorUrlEle
.
select
(
"img"
).
eachAttr
(
"src"
);
//////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo
itemInfo
=
new
ItemInfo
();
itemInfo
.
setShopName
(
PlatformEnum
.
VANS
.
getLabel
());
itemInfo
.
setShopUrl
(
"https://www.vans.com"
);
itemInfo
.
setItemId
(
pId
);
itemInfo
.
setTitle
(
pTitle
);
itemInfo
.
setPic
(
imageList
.
get
(
0
));
//////////////////////////////////// 获取商品基本信息End /////////////////////////
// //////////////////////////////////// 获取商品颜色属性 ////////////////////////////
// for (int i = 0; i < pColorList.size(); i++) {
// Set<ProductProp> propSet = new HashSet<>(16);
// ProductProp productPropColor = new ProductProp();
// productPropColor.setPropName(pColorList.get(i));
// productPropColor.setPropId(pColorNoList.get(i));
// productPropColor.setImage("https://underarmour.scene7.com/is/image/Underarmour/V5-" + pColorNoList.get(i) + "_FC_Main");
// propSet.add(productPropColor);
// if (productPropSet.get("颜色") == null) {
// productPropSet.put("颜色", propSet);
// } else {
// Set<ProductProp> oldPropSet = productPropSet.get("颜色");
// propSet.addAll(oldPropSet);
// productPropSet.put("颜色", propSet);
// }
// }
// //////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
//
// ///////////////////////// 获取商品尺码属性 ////////////////////
// for (int i = 0; i < pSizeList.size(); i++) {
// Set<ProductProp> sizePropSet = new HashSet<>();
// ProductProp productPropSize = new ProductProp();
// productPropSize.setPropId(pSizeNoList.get(i));
// productPropSize.setPropName(pSizeList.get(i));
// sizePropSet.add(productPropSize);
// if (productPropSet.get("尺码") == null) {
// productPropSet.put("尺码", sizePropSet);
// } else {
// Set<ProductProp> oldPropSet = productPropSet.get("尺码");
// sizePropSet.addAll(oldPropSet);
// productPropSet.put("尺码", sizePropSet);
// }
//
// }
// ///////////////////////// 获取商品尺码属性 END////////////////////
//
// //////////////////////////////////// 获取库存与原始价 ////////////////////////////////////////////
// for (String pColorNo : pColorNoList) {
// for (String pSizeNo : pSizeNoList) {
// // 设置库存id
// String skuStr = ";" + pColorNo + ";" + pSizeNo + ";";
// // 设置:商品包含库存信息
// productResponse.setStockFlag(true);
// List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
// ProductSkuStock productSkuStock = new ProductSkuStock();
// OriginalPrice originalPrice = new OriginalPrice();
// if (productSkuStockList == null) {
// productSkuStockList = new ArrayList<>();
// }
//
// // 设置:可用库存值,未有可用的库存数据
// productSkuStock.setSellableQuantity(999);
// // 设置:库存对应的id
// productSkuStock.setSkuStr(skuStr);
// productSkuStockList.add(productSkuStock);
// dynStock.setProductSkuStockList(productSkuStockList);
//
// // TODO 转换汇率,目前商品单位是人民币
// String originalFullPrice = SpiderUtil.exchangeRate(fullPrice);
// originalPrice.setPrice(originalFullPrice);
// productResponse.setPrice(originalFullPrice);
// productResponse.setSalePrice(originalFullPrice + "-" + originalFullPrice);
// originalPrice.setSkuStr(skuStr);
// originalPriceList.add(originalPrice);
// }
// }
// //////////////////////////////////// 获取库存与原始价 END///////////////////////////////
// 按照一下顺序进行 json 数据的填充
productResponse
.
setPropFlag
(
true
);
productResponse
.
setProductPropSet
(
productPropSet
);
productResponse
.
setPlatform
(
PlatformEnum
.
UNDERARMOUR
.
getValue
());
productResponse
.
setPromotionList
(
promotionList
);
productResponse
.
setOriginalPriceList
(
originalPriceList
);
productResponse
.
setItemInfo
(
itemInfo
);
productResponse
.
setDynStock
(
dynStock
);
return
productResponse
;
}
}
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论