提交 9c9bfa0d authored 作者: 梁业锦's avatar 梁业锦 💬

优化爬虫抓取链接的判断

上级 e91ff555
...@@ -36,17 +36,19 @@ ...@@ -36,17 +36,19 @@
- 命名:gap - 命名:gap
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
- 失效,无法爬取数据 - 失效,无法爬取数据
### [Zara](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/ZaraSpider.java) ### [Zara](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/ZaraSpider.java)
- 主页:https://www.zara.cn/cn - 主页:https://www.zara.cn/cn
- 命名:zara - 命名:zara
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
- 可能存在的缺陷: - 可能存在的缺陷:
### [Uniqlo(优衣库)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java) ### [Uniqlo(优衣库)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java)
- 主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html - 主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html
- 命名:uniqlo - 命名:uniqlo
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
- App无法爬取数据 - 失效
- 链接做了反爬处理
- 可能存在的缺陷: - 可能存在的缺陷:
- 图片的路径是直接下载图片 - 图片的路径是直接下载图片
...@@ -65,9 +67,8 @@ ...@@ -65,9 +67,8 @@
### [H&M](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java) ### [H&M](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/HmSpider.java)
- 主页:https://www2.hm.com/zh_cn/ - 主页:https://www2.hm.com/zh_cn/
- 命名:hm - 命名:hm
- 爬虫进度:已能获取到数据 - 爬虫进度:**已完成**
- Json被做了一些难处理的封装,现有工具无法将其转换为Json格式
- 商品颜色通过商品详情页的url来区分,暂未找到规律
### LiLy ### LiLy
- 主页:http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore - 主页:http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore
...@@ -138,10 +139,11 @@ ...@@ -138,10 +139,11 @@
- 主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html - 主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html
- 命名:massimodutti - 命名:massimodutti
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
- 失效
- 链接做了反爬处理
- 数据来源 - 数据来源
- 商品详情:https://www.massimodutti.cn/cn/%E5%A5%B3%E8%A3%85/%E7%B3%BB%E5%88%97/%E8%A1%AC%E8%A1%AB%E5%92%8C%E7%BD%A9%E8%A1%AB/%E8%A1%AC%E8%A1%AB/%E6%BB%91%E9%9B%AA%E9%A3%8E%E7%B3%BB%E5%88%97%E9%A5%B0%E5%8F%A3%E8%A2%8B%E8%A1%AC%E8%A1%AB-c1718602p8730105.html?colorId=420&categoryId=1718602 - 商品详情:https://www.massimodutti.cn/cn/%E5%A5%B3%E8%A3%85/%E7%B3%BB%E5%88%97/%E8%A1%AC%E8%A1%AB%E5%92%8C%E7%BD%A9%E8%A1%AB/%E8%A1%AC%E8%A1%AB/%E6%BB%91%E9%9B%AA%E9%A3%8E%E7%B3%BB%E5%88%97%E9%A5%B0%E5%8F%A3%E8%A2%8B%E8%A1%AC%E8%A1%AB-c1718602p8730105.html?colorId=420&categoryId=1718602
- 数据接口:https://www.massimodutti.cn/itxrest/2/catalog/store/35009478/30359500/category/0/product/8730105/detail?languageId=-7&appId=1 - 数据接口:https://www.massimodutti.cn/itxrest/2/catalog/store/35009478/30359500/category/0/product/8730105/detail?languageId=-7&appId=1
- App数据爬取失效
### [COACH(蔻驰)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/CoachSpider.java) ### [COACH(蔻驰)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/CoachSpider.java)
- 主页:https://china.coach.com/women.html - 主页:https://china.coach.com/women.html
......
...@@ -4,23 +4,15 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider; ...@@ -4,23 +4,15 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum; import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse; import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil; import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.JsoupUtil;
import com.diaoyun.zion.master.util.TranslateHelper; import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.HMSpiderParse; import com.diaoyun.zion.master.util.spider.HMSpiderParse;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component; import org.springframework.stereotype.Component;
import javax.print.Doc;
import java.io.IOException; import java.io.IOException;
import java.net.URISyntaxException; import java.net.URISyntaxException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException; import java.util.concurrent.TimeoutException;
...@@ -47,6 +39,8 @@ public class HmSpider implements IItemSpider { ...@@ -47,6 +39,8 @@ public class HmSpider implements IItemSpider {
*/ */
@Override @Override
public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException { public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException {
String[] spilt = targetUrl.split("productpage.");
targetUrl = "https://www2.hm.com/zh_cn/productpage." + spilt[1];
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.HM.getValue()); String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.HM.getValue());
ProductResponse productResponse = HMSpiderParse.formatProductResponse(content); ProductResponse productResponse = HMSpiderParse.formatProductResponse(content);
JSONObject resultObj = JSONObject.fromObject(productResponse); JSONObject resultObj = JSONObject.fromObject(productResponse);
...@@ -54,4 +48,22 @@ public class HmSpider implements IItemSpider { ...@@ -54,4 +48,22 @@ public class HmSpider implements IItemSpider {
return resultObj; return resultObj;
} }
// public static void main(String[] args) throws Exception {
// String targetUrl = "https://m2.hm.com/m/zh_cn/productpage.0806412004.html";
// String[] spilt = targetUrl.split("productpage.");
// targetUrl = "https://www2.hm.com/zh_cn/productpage." + spilt[1];
// String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.HM.getValue());
// // 获取主要数据并将转换 Json 数据及 Document 对象
// String detailStr = JsoupUtil.getScriptContent(content, "productArticleDetails");
// int firstBrackets = detailStr.indexOf("{");
// int lastbrackets = detailStr.lastIndexOf("}");
// String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
// resultStr = resultStr.replaceAll("\'", "\"")
// .replaceAll("\"image\": isDesktop [?] ", "")
// .replaceAll("\"fullscreen\": isDesktop [?] ", "")
// .replaceAll("\"zoom\": isDesktop [?] ", "");
// JSONObject dataMap = JSONObject.fromObject(resultStr);
// Document document = Jsoup.parse(content);
// }
} }
...@@ -48,7 +48,6 @@ public class LilySpider implements IItemSpider { ...@@ -48,7 +48,6 @@ public class LilySpider implements IItemSpider {
String targetUrl = "http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore/24003/276409"; String targetUrl = "http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore/24003/276409";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.LILY.getValue()); String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.LILY.getValue());
Document document = Jsoup.parse(content); Document document = Jsoup.parse(content);
System.out.println(content);
System.err.println(document); System.err.println(document);
} }
} }
...@@ -4,8 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider; ...@@ -4,8 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum; import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse; import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil; import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.spider.PullAndBearSpiderParse;
import com.diaoyun.zion.master.util.TranslateHelper; import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.SpiderUtil;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
...@@ -33,7 +33,7 @@ public class PullandbearSpider implements IItemSpider { ...@@ -33,7 +33,7 @@ public class PullandbearSpider implements IItemSpider {
/** /**
* PullAndBear 数据爬虫 * PullAndBear 数据爬虫
* @see PullAndBearSpiderParse#formatProductResponse 格式化数据方法 * @see SpiderUtil#formatPullAndBearProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径 * @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据 * @return 格式化与翻译后的 Json 数据
*/ */
...@@ -43,13 +43,25 @@ public class PullandbearSpider implements IItemSpider { ...@@ -43,13 +43,25 @@ public class PullandbearSpider implements IItemSpider {
targetUrl = PULL_AND_BEAR_URL + pId + "/detail?languageId=-7&appId=1"; targetUrl = PULL_AND_BEAR_URL + pId + "/detail?languageId=-7&appId=1";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.PULLANDBEAR.getValue()); String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.PULLANDBEAR.getValue());
JSONObject resultJson = JSONObject.fromObject(content); JSONObject resultJson = JSONObject.fromObject(content);
ProductResponse productResponse = PullAndBearSpiderParse.formatProductResponse(resultJson, pId); ProductResponse productResponse = SpiderUtil.formatPullAndBearProductResponse(resultJson, pId);
resultJson = JSONObject.fromObject(productResponse); resultJson = JSONObject.fromObject(productResponse);
// 翻译 // 翻译
TranslateHelper.translateProductResponse(resultJson); TranslateHelper.translateProductResponse(resultJson);
return resultJson; return resultJson;
} }
// /**
// * PullAndBear 获取商品详情数据的方式
// * @param args
// * @throws Exception
// */
// public static void main(String[] args) throws Exception {
// String targetUrl = "https://www.pullandbear.cn/cn/%25E7%2594%25B7%25E8%25A3%2585/%25E6%259C%258D%25E8%25A3%2585/%25E5%25A4%25A7%25E8%25A1%25A3%25E5%2592%258C%25E5%25A4%25B9%25E5%2585%258B/cazadora-tipo-plumas-costuras-invisibles-c-capucha-c1030204837p501658014.html?cS=800";
// String pId = targetUrl.substring(targetUrl.lastIndexOf("p")+1, targetUrl.lastIndexOf(".html"));
// targetUrl = PULL_AND_BEAR_URL + pId + "/detail?languageId=-7&appId=1";
// String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.PULLANDBEAR.getValue());
// System.err.println(content);
// }
} }
......
...@@ -18,7 +18,7 @@ import java.util.concurrent.TimeoutException; ...@@ -18,7 +18,7 @@ import java.util.concurrent.TimeoutException;
/** /**
* 优衣库数据爬虫 * 优衣库数据爬虫
* * TODO 读取不到链接
* @author 爱酱油不爱醋 * @author 爱酱油不爱醋
*/ */
@Component("uniqloSpider") @Component("uniqloSpider")
......
package com.diaoyun.zion.chinafrica.bis.impl;
import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.SpiderUtil;
import com.diaoyun.zion.master.util.spider.VansSpiderParse;
import net.sf.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;
/**
* Vans(范斯) 数据爬虫
*
* @author 爱酱油不爱醋
*/
@Component("vansSpider")
public class VansSpider implements IItemSpider {
private static Logger logger = LoggerFactory.getLogger(ZaraSpider.class);
/**
* Vans 数据爬虫
* @see VansSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
*/
@Override
public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException {
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.VANS.getValue());
Document document = Jsoup.parse(content);
String pTitle = document.select("product-titles").text();
String[] spilt = targetUrl.split("/");
String pId = SpiderUtil.retainNumber(spilt[4]);
targetUrl = "https://" + spilt[2] + "/wap/product-ajax_product_spec-" + pId + ".html";
content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.VANS.getValue());
ProductResponse productResponse = VansSpiderParse.formatProductResponse(content, pId, pTitle);
JSONObject resultObj = JSONObject.fromObject(productResponse);
TranslateHelper.translateProductResponse(resultObj);
return resultObj;
}
}
...@@ -47,7 +47,7 @@ public class SpiderServiceImpl implements SpiderService { ...@@ -47,7 +47,7 @@ public class SpiderServiceImpl implements SpiderService {
platformEnum = PlatformEnum.TB; platformEnum = PlatformEnum.TB;
} else if (targetUrl.contains("tmall.com/item.htm")) { } else if (targetUrl.contains("tmall.com/item.htm")) {
platformEnum = PlatformEnum.TM; platformEnum = PlatformEnum.TM;
} else if (targetUrl.contains("pullandbear.cn/cn/")) { } else if (targetUrl.contains("pullandbear.cn/")) {
platformEnum = PlatformEnum.PULLANDBEAR; platformEnum = PlatformEnum.PULLANDBEAR;
} else if(targetUrl.contains("www.gap.cn/pdp/")) { } else if(targetUrl.contains("www.gap.cn/pdp/")) {
platformEnum=PlatformEnum.GAP; platformEnum=PlatformEnum.GAP;
...@@ -57,34 +57,33 @@ public class SpiderServiceImpl implements SpiderService { ...@@ -57,34 +57,33 @@ public class SpiderServiceImpl implements SpiderService {
platformEnum=PlatformEnum.AfriEshop; platformEnum=PlatformEnum.AfriEshop;
} else if (targetUrl.contains("zara.cn")) { } else if (targetUrl.contains("zara.cn")) {
platformEnum = PlatformEnum.ZARA; platformEnum = PlatformEnum.ZARA;
} else if (targetUrl.contains("uniqlo") && targetUrl.contains("#/product?pid")) { } else if (targetUrl.contains("uniqlo.cn/") && targetUrl.contains("#/product?pid")) {
platformEnum = PlatformEnum.UNIQLO; platformEnum = PlatformEnum.UNIQLO;
} else if (targetUrl.contains("hm.com/") && targetUrl.contains("productpage")) { } else if (targetUrl.contains("hm.com/m") && targetUrl.contains("productpage")) {
platformEnum = PlatformEnum.HM; platformEnum = PlatformEnum.HM;
} else if(targetUrl.contains("https://www.adidas.com.cn/item")) { } else if(targetUrl.contains("adidas.com") && targetUrl.contains("item")) {
platformEnum=PlatformEnum.ADIDAS; platformEnum=PlatformEnum.ADIDAS;
} else if(targetUrl.contains("http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore")) { } else if(targetUrl.contains("http://www.lily.sh.cn/webapp/wcs/stores/servlet/lilystore")) {
platformEnum=PlatformEnum.LILY; platformEnum=PlatformEnum.LILY;
} else if(targetUrl.contains("http://wap.ur.com.cn/product/detail")) { } else if(targetUrl.contains("wap.ur") && targetUrl.contains("product")) {
platformEnum=PlatformEnum.URBANREVIVO; platformEnum=PlatformEnum.URBANREVIVO;
} else if(targetUrl.contains("underarmour.cn/p")) { } else if(targetUrl.contains("underarmour")) {
platformEnum=PlatformEnum.UNDERARMOUR; platformEnum=PlatformEnum.UNDERARMOUR;
} else if(targetUrl.contains("ochirly.com.cn/p/mobile/")) { } else if(targetUrl.contains("ochirly.com") && targetUrl.contains("p/mobile/")) {
platformEnum=PlatformEnum.OCHIRLY; platformEnum=PlatformEnum.OCHIRLY;
} else if(targetUrl.contains("esprit.cn/product/")) { } else if(targetUrl.contains("esprit.cn/product/") && targetUrl.contains("styleNo") && targetUrl.contains("skucode")) {
platformEnum=PlatformEnum.ESPRIT; platformEnum=PlatformEnum.ESPRIT;
} else if(targetUrl.contains("levi.com.cn/product/")) { } else if(targetUrl.contains("levi.com") && targetUrl.contains("product")) {
platformEnum=PlatformEnum.LEVI; platformEnum=PlatformEnum.LEVI;
} else if(targetUrl.contains("moco.com/moco/zh/p/")) { } else if(targetUrl.contains("moco.com/moco/")) {
platformEnum=PlatformEnum.MOCO; platformEnum=PlatformEnum.MOCO;
} else if (targetUrl.contains("massimodutti.cn") && targetUrl.contains("colorId") && targetUrl.contains("categoryId")) { } else if (targetUrl.contains("massimodutti") && targetUrl.contains("colorId") && targetUrl.contains("categoryId")) {
platformEnum = PlatformEnum.MASSIMODUTTI; platformEnum = PlatformEnum.MASSIMODUTTI;
} else if (targetUrl.contains("coach.com/coach")) { } else if (targetUrl.contains("coach")) {
platformEnum = PlatformEnum.COACH; platformEnum = PlatformEnum.COACH;
} else if (targetUrl.contains("vans.com") && targetUrl.contains("wap/product")) { } else if (targetUrl.contains("vans.com") && targetUrl.contains("wap/product")) {
platformEnum = PlatformEnum.VANS; platformEnum = PlatformEnum.VANS;
} }
return platformEnum; return platformEnum;
} }
} }
...@@ -17,7 +17,7 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate; ...@@ -17,7 +17,7 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
/** /**
* H&M 爬虫数据解析 * H&M 爬虫数据解析
* * @see com.diaoyun.zion.chinafrica.bis.impl.HmSpider
* @author 爱酱油不爱醋 * @author 爱酱油不爱醋
*/ */
public class HMSpiderParse { public class HMSpiderParse {
...@@ -29,10 +29,12 @@ public class HMSpiderParse { ...@@ -29,10 +29,12 @@ public class HMSpiderParse {
*/ */
public static ProductResponse formatProductResponse(String content) { public static ProductResponse formatProductResponse(String content) {
// targetUrl=https://m2.hm.com/m/zh_cn/productpage.0806412004.html
// 获取主要数据并将转换 Json 数据及 Document 对象 // 获取主要数据并将转换 Json 数据及 Document 对象
String detailStr = JsoupUtil.getScriptContent(content, "productArticleDetails"); String detailStr = JsoupUtil.getScriptContent(content, "productArticleDetails");
int firstBrackets=detailStr.indexOf("{"); int firstBrackets = detailStr.indexOf("{");
int lastbrackets=detailStr.lastIndexOf("}"); int lastbrackets = detailStr.lastIndexOf("}");
String resultStr = detailStr.substring(firstBrackets,lastbrackets+1); String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
resultStr = resultStr.replaceAll("\'", "\"") resultStr = resultStr.replaceAll("\'", "\"")
.replaceAll("\"image\": isDesktop [?] ", "") .replaceAll("\"image\": isDesktop [?] ", "")
......
package com.diaoyun.zion.master.util.spider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import java.math.BigDecimal;
import java.util.*;
/**
* PullAndBear 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.PullandbearSpider 数据爬虫
* @author 爱酱油不爱醋
*/
public class PullAndBearSpiderParse {
/**
* 格式化 PullAndBear 返回数据
* @param dataMap 主要的 json 数据
* @param pId 商品链接的 id
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(JSONObject dataMap, String pId) {
// 声明封装类
ProductResponse productResponse = new ProductResponse();
// 属性:Zara 的商品属性有颜色、尺码
Map<String, Set<ProductProp>> productPropSet = new HashMap<>(16);
// 原始价
List<OriginalPrice> originalPriceList = new ArrayList<>();
// 促销价格
List<ProductPromotion> promotionList = new ArrayList<>();
// 库存
DynStock dynStock = new DynStock();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock.setSellableQuantity(9999);
// 商品基本信息
ItemInfo itemInfo = new ItemInfo();
// 取 bundleProductSummaries 的节点对象
JSONObject bundleProductSummariesObj = dataMap.getJSONArray("bundleProductSummaries").getJSONObject(0);
//////////////////////////////////// 获取商品基本信息 ////////////////////////////////////////////
itemInfo.setShopName(PlatformEnum.PULLANDBEAR.getLabel());
itemInfo.setShopUrl("https://www.pullandbear.cn/cn/");
itemInfo.setItemId(pId);
itemInfo.setTitle(bundleProductSummariesObj.getString("name"));
//////////////////////////////////// 获取商品基本信息End(图片下取) ////////////////////////////////////////////
// 取 colors 数组节点
JSONArray colorsArr = bundleProductSummariesObj.getJSONObject("detail").getJSONArray("colors");
for (int i = 0; i < colorsArr.size(); i++) {
JSONObject colorsObj = colorsArr.getJSONObject(i);
//////////////////////////////////// 获取商品颜色与图片属性 ////////////////////////////////////////////
Set<ProductProp> propSetColor = new HashSet<>(16);
ProductProp productPropColor = new ProductProp();
// 颜色id
String colorNo = colorsObj.getString("id");
productPropColor.setPropId(colorNo);
// 颜色名
String colorName = colorsObj.getString("name");
productPropColor.setPropName(colorName);
// 取 image 对象节点
JSONObject imageObj = colorsObj.getJSONObject("image");
// 颜色图片
String imageUrl = "https://static.pullandbear.cn/2/photos/"
+ imageObj.getString("url")
+ "_2_1_8.jpg?t="
+ imageObj.getString("timestamp");
productPropColor.setImage(imageUrl);
if (i == 0) {
itemInfo.setPic(imageUrl);
}
propSetColor.add(productPropColor);
if (productPropSet.get("颜色") == null) {
productPropSet.put("颜色", propSetColor);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("颜色");
propSetColor.addAll(oldPropSet);
productPropSet.put("颜色", propSetColor);
}
//////////////////////////////////// 获取商品颜色与图片属性 END ////////////////////////////////////////////
// 取 siezes 对象数组
JSONArray sizesArr = colorsObj.getJSONArray("sizes");
for (int j = 0; j < sizesArr.size(); j++) {
JSONObject sizesObj = sizesArr.getJSONObject(j);
///////////////////////// 获取商品尺码属性 ////////////////////
Set<ProductProp> sizePropSetSize = new HashSet<>();
ProductProp productPropSize = new ProductProp();
String size = sizesObj.getString("name");
productPropSize.setPropName(size);
String sizeNo = sizesObj.getString("sku");
productPropSize.setPropId(sizeNo);
sizePropSetSize.add(productPropSize);
if (productPropSet.get("尺码") == null) {
productPropSet.put("尺码", sizePropSetSize);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("尺码");
sizePropSetSize.addAll(oldPropSet);
productPropSet.put("尺码", sizePropSetSize);
}
///////////////////////// 获取商品尺码属性 END////////////////////
// 商品的库存id
String skuStr = ";" + colorNo + ";" + sizeNo + ";";
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse.setStockFlag(true);
List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
if (productSkuStockList == null) {
productSkuStockList = new ArrayList<>();
}
ProductSkuStock productSkuStock = new ProductSkuStock();
// 设置:可用库存值,PullAndBear 未有可用的库存数据
productSkuStock.setSellableQuantity(999);
// 设置:库存对应的id
productSkuStock.setSkuStr(skuStr);
productSkuStockList.add(productSkuStock);
dynStock.setProductSkuStockList(productSkuStockList);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice originalPrice = new OriginalPrice();
// 获取商品的原始价
String fullPrice = sizesObj.getString("price");
BigDecimal priceOld=new BigDecimal(fullPrice);
BigDecimal div = new BigDecimal("100");
BigDecimal priceNew = priceOld.divide(div, 2, BigDecimal.ROUND_DOWN);
// TODO 转换汇率,目前商品单位是人民币
fullPrice= SpiderUtil.exchangeRate(priceNew.toString());
originalPrice.setPrice(fullPrice);
productResponse.setPrice(fullPrice);
productResponse.setSalePrice(fullPrice + "-" + fullPrice);
originalPrice.setSkuStr(skuStr);
originalPriceList.add(originalPrice);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
}
// 按照一下顺序进行 json 数据的填充
productResponse.setPropFlag(true);
productResponse.setProductPropSet(productPropSet);
productResponse.setPlatform(PlatformEnum.PULLANDBEAR.getValue());
productResponse.setPromotionList(promotionList);
productResponse.setOriginalPriceList(originalPriceList);
productResponse.setItemInfo(itemInfo);
productResponse.setDynStock(dynStock);
return productResponse;
}
}
package com.diaoyun.zion.master.util.spider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.util.*;
import java.util.regex.Pattern;
/**
* Vans(范斯) 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.VansSpider
* @author 爱酱油不爱醋
*/
public class VansSpiderParse {
/**
* 格式化返回数据
* @param content 主要的页面数据
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(String content, String pId, String pTitle) {
// 声明封装类
ProductResponse productResponse = new ProductResponse();
// 属性:Zara 的商品属性有颜色、尺码
Map<String, Set<ProductProp>> productPropSet = new HashMap<>(16);
// 原始价
List<OriginalPrice> originalPriceList = new ArrayList<>();
// 促销价格
List<ProductPromotion> promotionList = new ArrayList<>();
// 库存
DynStock dynStock = new DynStock();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock.setSellableQuantity(9999);
// 解析成 Document 对象
Document document = Jsoup.parse(content);
Elements colorUrlEle = document.select("div[class=pro-color]").select("a");
// 价格
String fullPrice = Pattern.compile("[^0-9]").matcher(document.select("span[id=spec_price]").text()).replaceAll("").trim();
// 颜色Id
List<String> colorNoList = colorUrlEle.eachAttr("data-product-id");
// 颜色名称
List<String> colorList = colorUrlEle.eachText();
// 颜色图片
List<String> imageList = colorUrlEle.select("img").eachAttr("src");
//////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName(PlatformEnum.VANS.getLabel());
itemInfo.setShopUrl("https://www.vans.com");
itemInfo.setItemId(pId);
itemInfo.setTitle(pTitle);
itemInfo.setPic(imageList.get(0));
//////////////////////////////////// 获取商品基本信息End /////////////////////////
// //////////////////////////////////// 获取商品颜色属性 ////////////////////////////
// for (int i = 0; i < pColorList.size(); i++) {
// Set<ProductProp> propSet = new HashSet<>(16);
// ProductProp productPropColor = new ProductProp();
// productPropColor.setPropName(pColorList.get(i));
// productPropColor.setPropId(pColorNoList.get(i));
// productPropColor.setImage("https://underarmour.scene7.com/is/image/Underarmour/V5-" + pColorNoList.get(i) + "_FC_Main");
// propSet.add(productPropColor);
// if (productPropSet.get("颜色") == null) {
// productPropSet.put("颜色", propSet);
// } else {
// Set<ProductProp> oldPropSet = productPropSet.get("颜色");
// propSet.addAll(oldPropSet);
// productPropSet.put("颜色", propSet);
// }
// }
// //////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
//
// ///////////////////////// 获取商品尺码属性 ////////////////////
// for (int i = 0; i < pSizeList.size(); i++) {
// Set<ProductProp> sizePropSet = new HashSet<>();
// ProductProp productPropSize = new ProductProp();
// productPropSize.setPropId(pSizeNoList.get(i));
// productPropSize.setPropName(pSizeList.get(i));
// sizePropSet.add(productPropSize);
// if (productPropSet.get("尺码") == null) {
// productPropSet.put("尺码", sizePropSet);
// } else {
// Set<ProductProp> oldPropSet = productPropSet.get("尺码");
// sizePropSet.addAll(oldPropSet);
// productPropSet.put("尺码", sizePropSet);
// }
//
// }
// ///////////////////////// 获取商品尺码属性 END////////////////////
//
// //////////////////////////////////// 获取库存与原始价 ////////////////////////////////////////////
// for (String pColorNo : pColorNoList) {
// for (String pSizeNo : pSizeNoList) {
// // 设置库存id
// String skuStr = ";" + pColorNo + ";" + pSizeNo + ";";
// // 设置:商品包含库存信息
// productResponse.setStockFlag(true);
// List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
// ProductSkuStock productSkuStock = new ProductSkuStock();
// OriginalPrice originalPrice = new OriginalPrice();
// if (productSkuStockList == null) {
// productSkuStockList = new ArrayList<>();
// }
//
// // 设置:可用库存值,未有可用的库存数据
// productSkuStock.setSellableQuantity(999);
// // 设置:库存对应的id
// productSkuStock.setSkuStr(skuStr);
// productSkuStockList.add(productSkuStock);
// dynStock.setProductSkuStockList(productSkuStockList);
//
// // TODO 转换汇率,目前商品单位是人民币
// String originalFullPrice = SpiderUtil.exchangeRate(fullPrice);
// originalPrice.setPrice(originalFullPrice);
// productResponse.setPrice(originalFullPrice);
// productResponse.setSalePrice(originalFullPrice + "-" + originalFullPrice);
// originalPrice.setSkuStr(skuStr);
// originalPriceList.add(originalPrice);
// }
// }
// //////////////////////////////////// 获取库存与原始价 END///////////////////////////////
// 按照一下顺序进行 json 数据的填充
productResponse.setPropFlag(true);
productResponse.setProductPropSet(productPropSet);
productResponse.setPlatform(PlatformEnum.UNDERARMOUR.getValue());
productResponse.setPromotionList(promotionList);
productResponse.setOriginalPriceList(originalPriceList);
productResponse.setItemInfo(itemInfo);
productResponse.setDynStock(dynStock);
return productResponse;
}
}
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论