提交 4c640e23 authored 作者: 梁业锦's avatar 梁业锦 💬

新增6个爬虫,重新调整了爬虫的代码

上级 5212e546
...@@ -35,21 +35,22 @@ ...@@ -35,21 +35,22 @@
- 主页:https://www.zara.cn/cn - 主页:https://www.zara.cn/cn
- 命名:zara - 命名:zara
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
### [Uniqlo](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java) ### [Uniqlo(优衣库)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UniqloSpider.java)
- 主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html - 主页:https://www.uniqlo.cn/UNIQLO_U19FW_MEN.html
- 命名:uniqlo - 命名:uniqlo
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
- 可能存在的缺陷: - 可能存在的缺陷:
- 图片的路径是直接下载图片 - 图片的路径是直接下载图片
### [Nike](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/NikeItemSpider.java) ### [Nike(耐克)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/NikeItemSpider.java)
- 主页:https://www.nike.com/cn - 主页:https://www.nike.com/cn
- 命名:nike - 命名:nike
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
### [Adidas](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/AdidasSpider.java) ### [Adidas(阿迪达斯)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/AdidasSpider.java)
- 主页:https://www.adidas.com.cn/ - 主页:https://www.adidas.com.cn/
- 命名:adidas - 命名:adidas
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
### H&M ### H&M
- 主页:https://www2.hm.com/zh_cn/ - 主页:https://www2.hm.com/zh_cn/
- 命名:hm - 命名:hm
...@@ -77,30 +78,36 @@ ...@@ -77,30 +78,36 @@
- 命名:abercrombie - 命名:abercrombie
- 爬虫进度:存在反爬机制 - 爬虫进度:存在反爬机制
- 链接做了编码形式的反爬机制 - 链接做了编码形式的反爬机制
### [Under Armour](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UnderArmourSpider.java)
### [Under Armour(安德玛)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/UnderArmourSpider.java)
- 主页:https://www.underarmour.cn/ - 主页:https://www.underarmour.cn/
- 命名:ur - 命名:ur
- 爬虫进度:**已完成** - 爬虫进度:**已完成**
### Converse 匡威
### Converse(匡威)
- 主页:https://www.converse.com.cn/ - 主页:https://www.converse.com.cn/
- 命名:converse - 命名:converse
- 爬虫进度:存在反向代理的反爬机制,暂无法爬取 - 爬虫进度:存在反爬机制
### Ochirly - 存在反向代理的反爬机制,暂无法爬取
### [Ochirly](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/OchirlySpider.java)
- 主页:http://www.ochirly.com.cn/SALE/list.shtml - 主页:http://www.ochirly.com.cn/SALE/list.shtml
- 命名:ochirly - 命名:ochirly
- 爬虫进度: - 爬虫进度:**已完成**
### Esprit
### [Esprit](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/EspritSpider.java)
- 主页:https://www.esprit.cn/ - 主页:https://www.esprit.cn/
- 命名:esprit - 命名:esprit
- 爬虫进度: - 爬虫进度:**已完成**
### Levi
### [Levi(李维斯)](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/LeviSpider.java)
- 主页:https://www.levi.com.cn/sale#page=3 - 主页:https://www.levi.com.cn/sale#page=3
- 命名:levi - 命名:levi
- 爬虫进度: - 爬虫进度:**已完成**
### Moco ### [MO&Co.](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/MocoSpider.java)
- 主页:https://www.moco.com/moco/zh/c/BS_DISCOUNT - 主页:https://www.moco.com/moco/zh/c/BS_DISCOUNT
- 命名:moco - 命名:moco
- 爬虫进度: - 爬虫进度:**已完成**
### [Massimo Dutti](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/MassimoduttiSpider.java) ### [Massimo Dutti](../src/main/java/com/diaoyun/zion/chinafrica/bis/impl/MassimoduttiSpider.java)
- 主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html - 主页:https://www.massimodutti.cn/cn/男装/季末折扣/休闲西装-c1745921.html
- 命名:massimodutti - 命名:massimodutti
......
package com.diaoyun.zion; package com.diaoyun.zion;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication; import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.web.servlet.support.SpringBootServletInitializer; import org.springframework.boot.web.servlet.support.SpringBootServletInitializer;
...@@ -19,9 +17,6 @@ public class ZionApplication extends SpringBootServletInitializer { ...@@ -19,9 +17,6 @@ public class ZionApplication extends SpringBootServletInitializer {
public static void main(String[] args) { public static void main(String[] args) {
SpringApplication.run(ZionApplication.class, args); SpringApplication.run(ZionApplication.class, args);
} }
@Test
public void test() throws Exception{
}
} }
package com.diaoyun.zion.chinafrica.bis.impl; package com.diaoyun.zion.chinafrica.bis.impl;
public class EspritSpider { import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.JsoupUtil;
import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.EspritSpiderParse;
import net.sf.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;
/**
* Esprit(思捷) 数据爬虫
*
* @author 爱酱油不爱醋
*/
@Component("espritSpider")
public class EspritSpider implements IItemSpider {
private static Logger logger = LoggerFactory.getLogger(PullandbearSpider.class);
/**
* Esprit(思捷) 商品详情页 Url
*/
private static final String ESPRIT_URL = "https://www.esprit.cn/product/";
/**
* Esprit(思捷) 数据爬虫
* @see EspritSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
*/
@Override
public JSONObject captureItem(String targetUrl) throws InterruptedException, IOException, ExecutionException, URISyntaxException, TimeoutException {
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ESPRIT.getValue());
JSONObject dataMap = JsoupUtil.getItemDetailByName(content, "window.__INITIAL_STATE__");
ProductResponse productResponse = EspritSpiderParse.formatProductResponse(dataMap);
JSONObject resultObj = JSONObject.fromObject(productResponse);
TranslateHelper.translateProductResponse(resultObj);
return resultObj;
}
} }
package com.diaoyun.zion.chinafrica.bis.impl; package com.diaoyun.zion.chinafrica.bis.impl;
import com.alibaba.druid.support.json.JSONUtils;
import com.diaoyun.zion.chinafrica.bis.IItemSpider; import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum; import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.master.util.HttpClientUtil; import com.diaoyun.zion.master.util.HttpClientUtil;
...@@ -18,6 +17,8 @@ import java.util.concurrent.TimeoutException; ...@@ -18,6 +17,8 @@ import java.util.concurrent.TimeoutException;
/** /**
* H&M 数据爬虫 * H&M 数据爬虫
* *
* TODO 数据被处理,尚未方法爬取
*
* @author 爱酱油不爱醋 * @author 爱酱油不爱醋
*/ */
@Component("hmSpider") @Component("hmSpider")
...@@ -49,6 +50,18 @@ public class HmSpider implements IItemSpider { ...@@ -49,6 +50,18 @@ public class HmSpider implements IItemSpider {
return resultObj; return resultObj;
} }
public static void main(String[] args) throws Exception {
String targetUrl = "https://www2.hm.com/zh_cn/productpage.0754698003.html";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ZARA.getValue());
String detailStr = JsoupUtil.getScriptContent(content, "productArticleDetails");
int firstBrackets=detailStr.indexOf("{");
int lastbrackets=detailStr.lastIndexOf("}");
String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
resultStr = resultStr.replace("isDesktop ? ", "");
String regexp = "\'";
resultStr = resultStr.replaceAll(regexp, "\"");
JSONObject resultObj = JSONObject.fromObject(resultStr);
System.err.println(resultObj);
}
} }
package com.diaoyun.zion.chinafrica.bis.impl; package com.diaoyun.zion.chinafrica.bis.impl;
public class LeviSpider { import com.diaoyun.zion.chinafrica.bis.IItemSpider;
} import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.JsoupUtil;
import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.LeviSpiderParse;
import net.sf.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;
/**
* Levi(李维斯)
*
* @author 爱酱油不爱醋
*/
@Component("leviSpider")
public class LeviSpider implements IItemSpider {
private static Logger logger = LoggerFactory.getLogger(PullandbearSpider.class);
/**
* Levi(李维斯) 商品详情页 Url
*/
private static final String LEVI_URL = "https://www.levi.com.cn/product/";
/**
* Levi(李维斯) 数据爬虫
* @see LeviSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
*/
@Override
public JSONObject captureItem(String targetUrl) throws InterruptedException, IOException, ExecutionException, URISyntaxException, TimeoutException {
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ESPRIT.getValue());
JSONObject dataMap = JsoupUtil.getItemDetailByName(content, "window.__INITIAL_STATE__");
ProductResponse productResponse = LeviSpiderParse.formatProductResponse(dataMap);
JSONObject resultObj = JSONObject.fromObject(productResponse);
TranslateHelper.translateProductResponse(resultObj);
return resultObj;
}
}
\ No newline at end of file
...@@ -25,8 +25,6 @@ import java.util.concurrent.TimeoutException; ...@@ -25,8 +25,6 @@ import java.util.concurrent.TimeoutException;
*/ */
@Component("lilySpider") @Component("lilySpider")
public class LilySpider implements IItemSpider { public class LilySpider implements IItemSpider {
private static Logger logger = LoggerFactory.getLogger(PullandbearSpider.class); private static Logger logger = LoggerFactory.getLogger(PullandbearSpider.class);
/** /**
...@@ -43,7 +41,6 @@ public class LilySpider implements IItemSpider { ...@@ -43,7 +41,6 @@ public class LilySpider implements IItemSpider {
@Override @Override
public JSONObject captureItem(String targetUrl) throws InterruptedException, IOException, ExecutionException, URISyntaxException, TimeoutException { public JSONObject captureItem(String targetUrl) throws InterruptedException, IOException, ExecutionException, URISyntaxException, TimeoutException {
return null; return null;
} }
......
...@@ -5,7 +5,6 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum; ...@@ -5,7 +5,6 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse; import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil; import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.spider.MassimoDuttiSpiderParse; import com.diaoyun.zion.master.util.spider.MassimoDuttiSpiderParse;
import com.diaoyun.zion.master.util.spider.SpiderUtil;
import com.diaoyun.zion.master.util.TranslateHelper; import com.diaoyun.zion.master.util.TranslateHelper;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.slf4j.Logger; import org.slf4j.Logger;
...@@ -20,8 +19,6 @@ import java.util.concurrent.TimeoutException; ...@@ -20,8 +19,6 @@ import java.util.concurrent.TimeoutException;
/** /**
* Massimo Dutti 数据爬虫 * Massimo Dutti 数据爬虫
* *
* TODO App中路径传递存在一定的问题
*
* @author 爱酱油不爱醋 * @author 爱酱油不爱醋
*/ */
@Component("massimoduttiSpider") @Component("massimoduttiSpider")
......
package com.diaoyun.zion.chinafrica.bis.impl; package com.diaoyun.zion.chinafrica.bis.impl;
public class MocoSpider { import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.MocoSpiderParse;
import net.sf.json.JSONObject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;
/**
* MO&Co. 数据爬虫
*
* @author 爱酱油不爱醋
*/
@Component("mocoSpider")
public class MocoSpider implements IItemSpider {
private static Logger logger = LoggerFactory.getLogger(ZaraSpider.class);
/**
* MO&Co. 商品详情页Url
*/
private static final String MOCO_URL = "https://www.moco.com/moco/zh/p/";
/**
* MO&Co. 数据爬虫
* @see MocoSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据
*/
@Override
public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException {
// 截取链接中的商品id
String[] spilt = targetUrl.split("/p/");
String pId = spilt[1];
targetUrl = "https://www.moco.com/moco/zh/ajax/variant/" + pId;
// 通过接口获取主要商品的内容
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.MOCO.getValue());
JSONObject resultObj = JSONObject.fromObject(content);
// 格式化数据
ProductResponse productResponse = MocoSpiderParse.formatProductResponse(resultObj, pId);
resultObj = JSONObject.fromObject(productResponse);
// 翻译数据
TranslateHelper.translateProductResponse(resultObj);
return resultObj;
}
} }
...@@ -7,9 +7,6 @@ import com.diaoyun.zion.master.util.HttpClientUtil; ...@@ -7,9 +7,6 @@ import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.TranslateHelper; import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.OchirlySpiderParse; import com.diaoyun.zion.master.util.spider.OchirlySpiderParse;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component; import org.springframework.stereotype.Component;
...@@ -17,8 +14,6 @@ import org.springframework.stereotype.Component; ...@@ -17,8 +14,6 @@ import org.springframework.stereotype.Component;
import java.io.IOException; import java.io.IOException;
import java.net.URISyntaxException; import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException; import java.util.concurrent.TimeoutException;
...@@ -53,43 +48,4 @@ public class OchirlySpider implements IItemSpider { ...@@ -53,43 +48,4 @@ public class OchirlySpider implements IItemSpider {
return resultObj; return resultObj;
} }
public static void main(String[] args) throws Exception {
String targetUrl = "http://www.ochirly.com.cn/p/mobile/1ZY4070820410.shtml";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.UNDERARMOUR.getValue());
Document document = Jsoup.parse(content);
// 获取标题
Elements detailEle = document.select("div[class=detail]").select("div[class=desc]");
String pTitle = detailEle.select("h5").text();
System.err.println(pTitle);
// 获取价格
Elements priceEle = detailEle.select("p[class=price]");
String pPrice = priceEle.attr("data-list-price");
System.err.println(pPrice);
// 获取颜色id与图片
Elements colorEle = document.select("div[class=color]").select("ul[class=clearfix]");
List<String> imgUrlList = colorEle.select("a").eachAttr("href");
List<String> colorNoList = new ArrayList<>();
for (int i = 0; i < imgUrlList.size(); i++) {
String hrefStr = imgUrlList.get(i);
if (hrefStr.contains("/p/mobile/")) {
String[] spilt = hrefStr.split("/mobile/");
colorNoList.add(spilt[1].replaceAll(".shtml", ""));
} else {
colorNoList.add(0, priceEle.attr("data-sku"));
}
}
System.err.println(colorNoList);
List<String> pImgList = colorEle.select("img").eachAttr("src");
System.err.println(pImgList);
// 获取尺码
Elements sizeEle = document.select("div[class=size]").select("div[class=size_contain]").select("li");
Elements delEle = sizeEle.select("del").remove();
System.out.println(delEle);
List<String> pSizeList = sizeEle.eachText();
List<String> pSizeNoList = sizeEle.eachAttr("data-size-id");
System.err.println(pSizeList);
System.err.println(pSizeNoList);
}
} }
\ No newline at end of file
...@@ -8,9 +8,6 @@ import com.diaoyun.zion.master.util.TranslateHelper; ...@@ -8,9 +8,6 @@ import com.diaoyun.zion.master.util.TranslateHelper;
import com.diaoyun.zion.master.util.spider.UnderArmourSpiderParse; import com.diaoyun.zion.master.util.spider.UnderArmourSpiderParse;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component; import org.springframework.stereotype.Component;
......
...@@ -4,12 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider; ...@@ -4,12 +4,8 @@ import com.diaoyun.zion.chinafrica.bis.IItemSpider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum; import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.ProductResponse; import com.diaoyun.zion.chinafrica.vo.ProductResponse;
import com.diaoyun.zion.master.util.HttpClientUtil; import com.diaoyun.zion.master.util.HttpClientUtil;
import com.diaoyun.zion.master.util.JsoupUtil;
import com.diaoyun.zion.master.util.SpiderUtil;
import com.diaoyun.zion.master.util.TranslateHelper; import com.diaoyun.zion.master.util.TranslateHelper;
import com.google.gson.JsonArray; import com.diaoyun.zion.master.util.spider.ZaraSpiderParse;
import com.google.gson.JsonObject;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
...@@ -17,8 +13,6 @@ import org.springframework.stereotype.Component; ...@@ -17,8 +13,6 @@ import org.springframework.stereotype.Component;
import java.io.IOException; import java.io.IOException;
import java.net.URISyntaxException; import java.net.URISyntaxException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException; import java.util.concurrent.TimeoutException;
...@@ -39,8 +33,8 @@ public class ZaraSpider implements IItemSpider { ...@@ -39,8 +33,8 @@ public class ZaraSpider implements IItemSpider {
/** /**
* Massimo Dutti 数据爬虫 * Massimo Dutti 数据爬虫
* @see com.diaoyun.zion.chinafrica.service.impl.SpiderServiceImpl# 修改商品详情页路径 * @see com.diaoyun.zion.chinafrica.service.impl.SpiderServiceImpl# 修改商品详情页路径
* @see JsoupUtil#getZaraJsonData 返回截取到的主要商品数据 * @see ZaraSpiderParse#getJsonData 返回截取到的主要商品数据
* @see SpiderUtil#formatZaraProductResponse 格式化数据方法 * @see ZaraSpiderParse#formatProductResponse 格式化数据方法
* @param targetUrl 接收的商品详情路径 * @param targetUrl 接收的商品详情路径
* @return 格式化与翻译后的 Json 数据 * @return 格式化与翻译后的 Json 数据
*/ */
...@@ -48,24 +42,12 @@ public class ZaraSpider implements IItemSpider { ...@@ -48,24 +42,12 @@ public class ZaraSpider implements IItemSpider {
public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException { public JSONObject captureItem(String targetUrl) throws URISyntaxException, IOException, ExecutionException, InterruptedException, TimeoutException {
JSONObject resultObj; JSONObject resultObj;
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ZARA.getValue()); String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ZARA.getValue());
resultObj = JsoupUtil.getZaraJsonData(content); resultObj = ZaraSpiderParse.getJsonData(content);
ProductResponse productResponse = SpiderUtil.formatZaraProductResponse(resultObj); ProductResponse productResponse = ZaraSpiderParse.formatProductResponse(resultObj);
resultObj = JSONObject.fromObject(productResponse); resultObj = JSONObject.fromObject(productResponse);
TranslateHelper.translateProductResponse(resultObj); TranslateHelper.translateProductResponse(resultObj);
return resultObj; return resultObj;
} }
public static void main(String[] args) throws Exception {
String targetUrl = "https://www.nike.com/cn/t/air-max-90-betrue-%E7%94%B7%E5%AD%90%E8%BF%90%E5%8A%A8%E9%9E%8B-dgC1X4/CJ5482-100";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.NIKE.getValue());
JSONObject detail = JsoupUtil.getItemDetailByName(content, "window.INITIAL_REDUX_STATE");
System.err.println(detail);
JSONObject object = detail.getJSONObject("Threads").getJSONObject("products").getJSONObject("CJ5482-100");
String fullPrice = object.getString("fullPrice");
String currentPrice = object.getString("currentPrice");
System.out.println(fullPrice);
System.out.println(currentPrice);
}
} }
...@@ -26,10 +26,12 @@ public enum OrderStatusEnum implements EnumItemable<OrderStatusEnum> { ...@@ -26,10 +26,12 @@ public enum OrderStatusEnum implements EnumItemable<OrderStatusEnum> {
this.value = value; this.value = value;
} }
@Override
public String getLabel() { public String getLabel() {
return this.label; return this.label;
} }
@Override
public Integer getValue() { public Integer getValue() {
return this.value; return this.value;
} }
......
...@@ -4,12 +4,15 @@ package com.diaoyun.zion.chinafrica.enums; ...@@ -4,12 +4,15 @@ package com.diaoyun.zion.chinafrica.enums;
import com.diaoyun.zion.master.enums.EnumItemable; import com.diaoyun.zion.master.enums.EnumItemable;
/** /**
* 平台类型 * 爬虫枚举
* *
* @author G * @author G
*/ */
public enum PlatformEnum implements EnumItemable<PlatformEnum> { public enum PlatformEnum implements EnumItemable<PlatformEnum> {
/**
* 对应爬虫的枚举
*/
TB("淘宝", "tb"), TB("淘宝", "tb"),
TM("天猫", "tm"), TM("天猫", "tm"),
PULLANDBEAR("Pullandbear","pullandbear"), PULLANDBEAR("Pullandbear","pullandbear"),
...@@ -23,6 +26,9 @@ public enum PlatformEnum implements EnumItemable<PlatformEnum> { ...@@ -23,6 +26,9 @@ public enum PlatformEnum implements EnumItemable<PlatformEnum> {
URBANREVIVO("UrbanRevivo", "urbanrevivo"), URBANREVIVO("UrbanRevivo", "urbanrevivo"),
UNDERARMOUR("安德玛", "underarmour"), UNDERARMOUR("安德玛", "underarmour"),
OCHIRLY("Ochirly", "ochirly"), OCHIRLY("Ochirly", "ochirly"),
ESPRIT("思捷", "esprit"),
LEVI("李维斯", "levi"),
MOCO("MO&Co.", "moco"),
MASSIMODUTTI("MassimoDutti", "massimodutti"), MASSIMODUTTI("MassimoDutti", "massimodutti"),
UN("未知", "un"), UN("未知", "un"),
AfriEshop("afri-eshop","afri-eshop" ); AfriEshop("afri-eshop","afri-eshop" );
......
...@@ -5,10 +5,17 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum; ...@@ -5,10 +5,17 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.master.util.SpringContextUtil; import com.diaoyun.zion.master.util.SpringContextUtil;
/** /**
* 商品爬虫 * 商品爬虫工厂类
*
* @author G
*/ */
public class ItemSpiderFactory { public class ItemSpiderFactory {
/**
* 根据获取的路径选择对应的爬虫
* @param platformEnum 爬虫枚举
* @return 爬虫实现类
*/
public static IItemSpider getSpider(PlatformEnum platformEnum) { public static IItemSpider getSpider(PlatformEnum platformEnum) {
IItemSpider iItemSpider; IItemSpider iItemSpider;
switch (platformEnum.getValue()) { switch (platformEnum.getValue()) {
...@@ -64,6 +71,18 @@ public class ItemSpiderFactory { ...@@ -64,6 +71,18 @@ public class ItemSpiderFactory {
iItemSpider = (IItemSpider) SpringContextUtil.getBean("ochirlySpider"); iItemSpider = (IItemSpider) SpringContextUtil.getBean("ochirlySpider");
break; break;
} }
case "esprit": {
iItemSpider = (IItemSpider) SpringContextUtil.getBean("espritSpider");
break;
}
case "levi": {
iItemSpider = (IItemSpider) SpringContextUtil.getBean("leviSpider");
break;
}
case "moco": {
iItemSpider = (IItemSpider) SpringContextUtil.getBean("mocoSpider");
break;
}
case "massimodutti": { case "massimodutti": {
iItemSpider = (IItemSpider) SpringContextUtil.getBean("massimoduttiSpider"); iItemSpider = (IItemSpider) SpringContextUtil.getBean("massimoduttiSpider");
break; break;
......
...@@ -71,6 +71,12 @@ public class SpiderServiceImpl implements SpiderService { ...@@ -71,6 +71,12 @@ public class SpiderServiceImpl implements SpiderService {
platformEnum=PlatformEnum.UNDERARMOUR; platformEnum=PlatformEnum.UNDERARMOUR;
} else if(targetUrl.contains("ochirly.com.cn/p/mobile/")) { } else if(targetUrl.contains("ochirly.com.cn/p/mobile/")) {
platformEnum=PlatformEnum.OCHIRLY; platformEnum=PlatformEnum.OCHIRLY;
} else if(targetUrl.contains("esprit.cn/product/")) {
platformEnum=PlatformEnum.ESPRIT;
} else if(targetUrl.contains("levi.com.cn/product/")) {
platformEnum=PlatformEnum.LEVI;
} else if(targetUrl.contains("moco.com/moco/zh/p/")) {
platformEnum=PlatformEnum.MOCO;
} else if (targetUrl.contains("massimodutti.cn") && targetUrl.contains("colorId") && targetUrl.contains("categoryId")) { } else if (targetUrl.contains("massimodutti.cn") && targetUrl.contains("colorId") && targetUrl.contains("categoryId")) {
platformEnum = PlatformEnum.MASSIMODUTTI; platformEnum = PlatformEnum.MASSIMODUTTI;
} }
......
package com.diaoyun.zion.master.util; package com.diaoyun.zion.master.util;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import net.sf.json.JSONObject; import net.sf.json.JSONObject;
import org.apache.commons.text.StringEscapeUtils; import org.apache.commons.text.StringEscapeUtils;
import org.jsoup.Jsoup; import org.jsoup.Jsoup;
...@@ -11,8 +10,6 @@ import org.jsoup.select.Elements; ...@@ -11,8 +10,6 @@ import org.jsoup.select.Elements;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.net.URISyntaxException;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.HashMap; import java.util.HashMap;
import java.util.List; import java.util.List;
...@@ -104,6 +101,7 @@ public class JsoupUtil { ...@@ -104,6 +101,7 @@ public class JsoupUtil {
} }
return configGroup; return configGroup;
} }
/** /**
* 根据script中的变量名获取script中变量相关的内容,特指天猫的返回信息 没什么用,没有商品的规格信息 * 根据script中的变量名获取script中变量相关的内容,特指天猫的返回信息 没什么用,没有商品的规格信息
* *
...@@ -134,6 +132,12 @@ public class JsoupUtil { ...@@ -134,6 +132,12 @@ public class JsoupUtil {
return configGroup; return configGroup;
} }
/**
*
* @param needInfo
* @param configStr
* @param returnMap
*/
private static void getInfoFromJsStr(List<String> needInfo, String configStr, Map<String, String> returnMap) { private static void getInfoFromJsStr(List<String> needInfo, String configStr, Map<String, String> returnMap) {
for (String info : needInfo) { for (String info : needInfo) {
//获取 相关信息 //获取 相关信息
...@@ -231,42 +235,6 @@ public class JsoupUtil { ...@@ -231,42 +235,6 @@ public class JsoupUtil {
return dataMap; return dataMap;
} }
/**
* 获取 H&M 爬虫的主要数据
* @param content
* @throws Exception
*/
public static JSONObject getHMJsonData(String content) {
String detailStr = getScriptContent(content, "productArticleDetails");
int firstBrackets=detailStr.indexOf("{");
int lastbrackets=detailStr.lastIndexOf("}");
String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
int firstImage = detailStr.indexOf("'images':[");
int lastImage = detailStr.lastIndexOf("'video':");
detailStr = detailStr.substring(firstImage, lastImage);
resultStr = resultStr.replace(detailStr, "");
JSONObject resultObj = JSONObject.fromObject(resultStr);
return resultObj;
}
public static void main(String[] args) throws Exception {
String targetUrl = "https://www2.hm.com/zh_cn/productpage.0754698003.html";
String content = HttpClientUtil.getContentByUrl(targetUrl, PlatformEnum.ZARA.getValue());
String detailStr = getScriptContent(content, "productArticleDetails");
int firstBrackets=detailStr.indexOf("{");
int lastbrackets=detailStr.lastIndexOf("}");
String resultStr = detailStr.substring(firstBrackets,lastbrackets+1);
resultStr = resultStr.replace("isDesktop ? ", "");
String regexp = "\'";
resultStr = resultStr.replaceAll(regexp, "\"");
JSONObject resultObj = JSONObject.fromObject(resultStr);
System.err.println(resultObj);
}
/** /**
* 根据script id获取内容 * 根据script id获取内容
* @param content * @param content
......
package com.diaoyun.zion.master.util.spider; package com.diaoyun.zion.master.util.spider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import java.util.*;
import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
/**
* Esprit 数据爬虫
* @see com.diaoyun.zion.chinafrica.bis.impl.EspritSpider
* @author 爱酱油不爱醋
*/
public class EspritSpiderParse { public class EspritSpiderParse {
/**
* 格式化返回数据
* @param dataMap 主要的Json数据
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(JSONObject dataMap) {
// 声明封装类
ProductResponse productResponse = new ProductResponse();
// 属性:Zara 的商品属性有颜色、尺码
Map<String, Set<ProductProp>> productPropSet = new HashMap<>(16);
// 原始价
List<OriginalPrice> originalPriceList = new ArrayList<>();
// 促销价格
List<ProductPromotion> promotionList = new ArrayList<>();
// 库存
DynStock dynStock = new DynStock();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock.setSellableQuantity(9999);
// 取 product 下的 details 节点对象
JSONObject detailsObj = dataMap.getJSONObject("product").getJSONObject("details");
//////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName(PlatformEnum.ESPRIT.getLabel());
itemInfo.setShopUrl("https://www.esprit.cn");
itemInfo.setItemId(detailsObj.getString("code"));
itemInfo.setTitle(detailsObj.getString("title"));
//////////////////////////////////// 获取商品基本信息(图片下取)End /////////////////////////
//////////////////////////////////// 获取商品颜色属性 ////////////////////////////////////////////
// 取 options 的0位的 value 节点数组
JSONArray values_0_Arr = detailsObj.getJSONArray("options").getJSONObject(0).getJSONArray("values");
List<String> colorNoList = new ArrayList<>();
Set<ProductProp> propSet = new HashSet<>(16);
for (int i = 0; i < values_0_Arr.size(); i++) {
JSONObject values_0_Obj = values_0_Arr.getJSONObject(i);
// 获取图片路径
String imageUrl = values_0_Obj.getJSONArray("images").getJSONObject(0).getString("url");
// 设置商品基本信息的图片
if (i == 0) {
itemInfo.setPic(imageUrl);
}
ProductProp productPropColor = new ProductProp();
String colorNo = values_0_Obj.getString("code");
colorNoList.add(colorNo);
productPropColor.setPropId(colorNo);
productPropColor.setPropName(values_0_Obj.getString("displayName"));
productPropColor.setImage(imageUrl);
propSet.add(productPropColor);
if (productPropSet.get("颜色") == null) {
productPropSet.put("颜色", propSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("颜色");
propSet.addAll(oldPropSet);
productPropSet.put("颜色", propSet);
}
}
//////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
///////////////////////// 获取商品尺码属性 ////////////////////////////////////////////////////////////////
// 取 options 的 1 位的 value 节点数组
List<String> sizeNoList = new ArrayList<>();
Set<ProductProp> sizePropSet = new HashSet<>();
JSONArray values_1_Arr = detailsObj.getJSONArray("options").getJSONObject(1).getJSONArray("values");
for (int i = 0; i < values_1_Arr.size(); i++) {
JSONObject values_1_Obj = values_1_Arr.getJSONObject(i);
ProductProp productPropSize = new ProductProp();
String sizeNo = values_1_Obj.getString("code");
productPropSize.setPropId(sizeNo);
sizeNoList.add(sizeNo);
productPropSize.setPropName(values_1_Obj.getString("displayName"));
sizePropSet.add(productPropSize);
if (productPropSet.get("尺码") == null) {
productPropSet.put("尺码", sizePropSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("尺码");
sizePropSet.addAll(oldPropSet);
productPropSet.put("尺码", sizePropSet);
}
///////////////////////// 获取商品尺码属性 END/////////////////////////////////////////////////////
}
for (String colorNo : colorNoList) {
for (String sizeNo : sizeNoList) {
// 设置 skuStr
String skuStr = ";" + colorNo + ";" + sizeNo + ";";
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse.setStockFlag(true);
List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
if (productSkuStockList == null) {
productSkuStockList = new ArrayList<>();
}
ProductSkuStock productSkuStock = new ProductSkuStock();
// 设置:可用库存值,Zara 未有可用的库存数据
productSkuStock.setSellableQuantity(999);
// 设置:库存对应的id
productSkuStock.setSkuStr(skuStr);
productSkuStockList.add(productSkuStock);
dynStock.setProductSkuStockList(productSkuStockList);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice originalPrice = new OriginalPrice();
// 获取商品的原始价
String fullPrice = detailsObj.getJSONObject("salePrice").getString("amount");
// TODO 转换汇率,目前商品单位是人民币
fullPrice = exchangeRate(fullPrice);
originalPrice.setPrice(fullPrice);
productResponse.setPrice(fullPrice);
productResponse.setSalePrice(fullPrice + "-" + fullPrice);
originalPrice.setSkuStr(skuStr);
originalPriceList.add(originalPrice);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
}
productResponse.setPropFlag(true);
productResponse.setProductPropSet(productPropSet);
productResponse.setPlatform(PlatformEnum.ESPRIT.getValue());
productResponse.setPromotionList(promotionList);
productResponse.setOriginalPriceList(originalPriceList);
productResponse.setItemInfo(itemInfo);
productResponse.setDynStock(dynStock);
return productResponse;
}
} }
package com.diaoyun.zion.master.util.spider; package com.diaoyun.zion.master.util.spider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import java.util.*;
import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
/**
* Levi(李维斯) 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.LeviSpider
* @author 爱酱油不爱醋
*/
public class LeviSpiderParse { public class LeviSpiderParse {
/**
* 格式化返回数据
* @param dataMap 主要的Json数据
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(JSONObject dataMap) {
// 声明封装类
ProductResponse productResponse = new ProductResponse();
// 属性:Zara 的商品属性有颜色、尺码
Map<String, Set<ProductProp>> productPropSet = new HashMap<>(16);
// 原始价
List<OriginalPrice> originalPriceList = new ArrayList<>();
// 促销价格
List<ProductPromotion> promotionList = new ArrayList<>();
// 库存
DynStock dynStock = new DynStock();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock.setSellableQuantity(9999);
// 取 product 下的 details 节点对象
JSONObject detailsObj = dataMap.getJSONObject("product").getJSONObject("details");
//////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName(PlatformEnum.LEVI.getLabel());
itemInfo.setShopUrl("https://www.levi.com");
itemInfo.setItemId(detailsObj.getString("code"));
itemInfo.setTitle(detailsObj.getString("title"));
//////////////////////////////////// 获取商品基本信息(图片下取)End /////////////////////////
//////////////////////////////////// 获取商品颜色属性 ////////////////////////////////////////////
// 取 options 的0位的 value 节点数组
JSONArray values_0_Arr = detailsObj.getJSONArray("options").getJSONObject(0).getJSONArray("values");
List<String> colorNoList = new ArrayList<>();
Set<ProductProp> propSet = new HashSet<>(16);
for (int i = 0; i < values_0_Arr.size(); i++) {
JSONObject values_0_Obj = values_0_Arr.getJSONObject(i);
// 获取图片路径
String imageUrl = values_0_Obj.getJSONArray("images").getJSONObject(0).getString("url");
// 设置商品基本信息的图片
if (i == 0) {
itemInfo.setPic(imageUrl);
}
ProductProp productPropColor = new ProductProp();
String colorNo = values_0_Obj.getString("code");
colorNoList.add(colorNo);
productPropColor.setPropId(colorNo);
productPropColor.setPropName(values_0_Obj.getString("displayName"));
productPropColor.setImage(imageUrl);
propSet.add(productPropColor);
if (productPropSet.get("颜色") == null) {
productPropSet.put("颜色", propSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("颜色");
propSet.addAll(oldPropSet);
productPropSet.put("颜色", propSet);
}
}
//////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
///////////////////////// 获取商品尺码属性 ////////////////////////////////////////////////////////////////
// 取 options 的 1 位的 value 节点数组
List<String> sizeNoList = new ArrayList<>();
Set<ProductProp> sizePropSet = new HashSet<>();
JSONArray values_1_Arr = detailsObj.getJSONArray("options").getJSONObject(1).getJSONArray("values");
for (int i = 0; i < values_1_Arr.size(); i++) {
JSONObject values_1_Obj = values_1_Arr.getJSONObject(i);
ProductProp productPropSize = new ProductProp();
String sizeNo = values_1_Obj.getString("code");
productPropSize.setPropId(sizeNo);
sizeNoList.add(sizeNo);
productPropSize.setPropName(values_1_Obj.getString("displayName"));
sizePropSet.add(productPropSize);
if (productPropSet.get("尺码") == null) {
productPropSet.put("尺码", sizePropSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("尺码");
sizePropSet.addAll(oldPropSet);
productPropSet.put("尺码", sizePropSet);
}
///////////////////////// 获取商品尺码属性 END/////////////////////////////////////////////////////
}
for (String colorNo : colorNoList) {
for (String sizeNo : sizeNoList) {
// 设置 skuStr
String skuStr = ";" + colorNo + ";" + sizeNo + ";";
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse.setStockFlag(true);
List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
if (productSkuStockList == null) {
productSkuStockList = new ArrayList<>();
}
ProductSkuStock productSkuStock = new ProductSkuStock();
// 设置:可用库存值,Zara 未有可用的库存数据
productSkuStock.setSellableQuantity(999);
// 设置:库存对应的id
productSkuStock.setSkuStr(skuStr);
productSkuStockList.add(productSkuStock);
dynStock.setProductSkuStockList(productSkuStockList);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice originalPrice = new OriginalPrice();
// 获取商品的原始价
String fullPrice = detailsObj.getJSONObject("salePrice").getString("amount");
// TODO 转换汇率,目前商品单位是人民币
fullPrice = exchangeRate(fullPrice);
originalPrice.setPrice(fullPrice);
productResponse.setPrice(fullPrice);
productResponse.setSalePrice(fullPrice + "-" + fullPrice);
originalPrice.setSkuStr(skuStr);
originalPriceList.add(originalPrice);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
}
productResponse.setPropFlag(true);
productResponse.setProductPropSet(productPropSet);
productResponse.setPlatform(PlatformEnum.LEVI.getValue());
productResponse.setPromotionList(promotionList);
productResponse.setOriginalPriceList(originalPriceList);
productResponse.setItemInfo(itemInfo);
productResponse.setDynStock(dynStock);
return productResponse;
}
} }
...@@ -16,6 +16,13 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate; ...@@ -16,6 +16,13 @@ import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
*/ */
public class LilySpiderParse { public class LilySpiderParse {
/**
* 格式化返回数据
* TODO 未完成
* @param content 主要的商品内容
* @param pId 商品id
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(String content, String pId) { public static ProductResponse formatProductResponse(String content, String pId) {
// 声明封装类 // 声明封装类
ProductResponse productResponse = new ProductResponse(); ProductResponse productResponse = new ProductResponse();
...@@ -42,78 +49,7 @@ public class LilySpiderParse { ...@@ -42,78 +49,7 @@ public class LilySpiderParse {
//////////////////////////////////// 获取商品基本信息 //////////////////////////// //////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo itemInfo = new ItemInfo(); ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName(PlatformEnum.ADIDAS.getLabel());
itemInfo.setShopUrl("https://www.adidas.com");
itemInfo.setItemId(pId);
itemInfo.setTitle(pTitle);
itemInfo.setPic(pImg);
//////////////////////////////////// 获取商品基本信息End /////////////////////////
for (int i = 0; i < pSizeList.size(); i++) {
// 库存对应的id(颜色id + 尺码id)
String skuStr = ";" + pId + ";" + pSizeList.get(i) + ";";
///////////////////////// 获取商品尺码属性 ////////////////////
Set<ProductProp> sizePropSet = new HashSet<>();
ProductProp productPropSize = new ProductProp();
productPropSize.setPropId(pSizeList.get(i));
productPropSize.setPropName(pSizeList.get(i));
sizePropSet.add(productPropSize);
if (productPropSet.get("尺码") == null) {
productPropSet.put("尺码", sizePropSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("尺码");
sizePropSet.addAll(oldPropSet);
productPropSet.put("尺码", sizePropSet);
}
///////////////////////// 获取商品尺码属性 END////////////////////
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse.setStockFlag(true);
List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
if (productSkuStockList == null) {
productSkuStockList = new ArrayList<>();
}
ProductSkuStock productSkuStock = new ProductSkuStock();
// 设置:可用库存值,Zara 未有可用的库存数据
productSkuStock.setSellableQuantity(999);
// 设置:库存对应的id
productSkuStock.setSkuStr(skuStr);
productSkuStockList.add(productSkuStock);
dynStock.setProductSkuStockList(productSkuStockList);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice originalPrice = new OriginalPrice();
// 获取商品的原始价
// TODO 转换汇率,目前商品单位是人民币
fullPrice = exchangeRate(fullPrice);
originalPrice.setPrice(fullPrice);
productResponse.setPrice(fullPrice);
productResponse.setSalePrice(fullPrice + "-" + fullPrice);
originalPrice.setSkuStr(skuStr);
originalPriceList.add(originalPrice);
//////////////////////////////////// 获取原始价 END///////////////////////////////
}
//////////////////////////////////// 获取商品颜色属性 ////////////////////////////
Set<ProductProp> propSet = new HashSet<>(16);
ProductProp productPropColor = new ProductProp();
// 颜色描述
productPropColor.setPropId(pId);
productPropColor.setPropName(pColor);
productPropColor.setImage(pImg);
propSet.add(productPropColor);
if (productPropSet.get("颜色") == null) {
productPropSet.put("颜色", propSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("颜色");
propSet.addAll(oldPropSet);
productPropSet.put("颜色", propSet);
}
//////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
// 按照一下顺序进行 json 数据的填充 // 按照一下顺序进行 json 数据的填充
productResponse.setPropFlag(true); productResponse.setPropFlag(true);
......
package com.diaoyun.zion.master.util.spider; package com.diaoyun.zion.master.util.spider;
import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import java.util.*;
import static com.diaoyun.zion.master.util.spider.SpiderUtil.exchangeRate;
/**
* MO&Co. 爬虫数据解析
* @see com.diaoyun.zion.chinafrica.bis.impl.MocoSpider
* @author 爱酱油不爱醋
*/
public class MocoSpiderParse { public class MocoSpiderParse {
/**
* 格式化返回数据
* @param dataMap 主要的 Json 内容
* @param pId 截取的商品 id
* @return 格式化后的数据
*/
public static ProductResponse formatProductResponse(JSONObject dataMap, String pId) {
// 声明封装类
ProductResponse productResponse = new ProductResponse();
// 属性:Zara 的商品属性有颜色、尺码
Map<String, Set<ProductProp>> productPropSet = new HashMap<>(16);
// 原始价
List<OriginalPrice> originalPriceList = new ArrayList<>();
// 促销价格
List<ProductPromotion> promotionList = new ArrayList<>();
// 库存
DynStock dynStock = new DynStock();
// 其实数据没有包含确切的库存数,这里默认给足量的库存
dynStock.setSellableQuantity(9999);
// 取 productData 对象节点
JSONObject productDataObj = dataMap.getJSONObject("productData");
//////////////////////////////////// 获取商品基本信息 ////////////////////////////////////////////
ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName("MO&Co.");
itemInfo.setShopUrl("https://en.mo-co.com/");
itemInfo.setItemId(pId);
itemInfo.setTitle(productDataObj.getString("name"));
//////////////////////////////////// 获取商品基本信息End(图片下取) ////////////////////////////////////////////
//////////////////////////////////// 获取商品颜色属性 ////////////////////////////////////////////
// 取 baseOptions 对象数组的 1 位的 options 节点数组
JSONArray options_1_Arr = productDataObj.getJSONArray("baseOptions").getJSONObject(1).getJSONArray("options");
List<String> colorNoList = new ArrayList<>();
Set<ProductProp> propSet = new HashSet<>(16);
for (int i = 0; i < options_1_Arr.size(); i++) {
JSONObject options_1_Obj = options_1_Arr.getJSONObject(i);
// 获取图片的路径
String[] spiltImg = options_1_Obj.getJSONArray("variantOptionQualifiers")
.getJSONObject(0).getJSONObject("image").getString("url").split("_other_");
String imageUrl = "https://mallimg.moco.com/" + pId + "_list_" + spiltImg[1];
// 设置商品基本信息的图片
if (i == 0) {
itemInfo.setPic(imageUrl);
}
ProductProp productPropColor = new ProductProp();
String colorNo = options_1_Obj.getString("epoColorCode");
colorNoList.add(colorNo);
productPropColor.setPropId(colorNo);
productPropColor.setPropName(options_1_Obj.getString("epoColorName"));
productPropColor.setImage(imageUrl);
propSet.add(productPropColor);
if (productPropSet.get("颜色") == null) {
productPropSet.put("颜色", propSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("颜色");
propSet.addAll(oldPropSet);
productPropSet.put("颜色", propSet);
}
}
//////////////////////////////////// 获取商品颜色属性 END ////////////////////////////////////////////
///////////////////////// 获取商品尺码属性 ////////////////////////////////////////////////////////////////
// 取 baseOptions 的 0 位的 options 节点数组
List<String> sizeNoList = new ArrayList<>();
Set<ProductProp> sizePropSet = new HashSet<>();
JSONArray options_0_Arr = productDataObj.getJSONArray("baseOptions").getJSONObject(0).getJSONArray("options");
for (int i = 0; i < options_0_Arr.size(); i++) {
JSONObject options_0_Obj = options_0_Arr.getJSONObject(i);
ProductProp productPropSize = new ProductProp();
String sizeNo = options_0_Obj.getString("epoSizeCode");
productPropSize.setPropId(sizeNo);
sizeNoList.add(sizeNo);
productPropSize.setPropName(options_0_Obj.getString("epoSizeName") +options_0_Obj.getString("sizeDescription"));
sizePropSet.add(productPropSize);
if (productPropSet.get("尺码") == null) {
productPropSet.put("尺码", sizePropSet);
} else {
Set<ProductProp> oldPropSet = productPropSet.get("尺码");
sizePropSet.addAll(oldPropSet);
productPropSet.put("尺码", sizePropSet);
}
///////////////////////// 获取商品尺码属性 END/////////////////////////////////////////////////////
}
for (String colorNo : colorNoList) {
for (String sizeNo : sizeNoList) {
// 设置 skuStr
String skuStr = ";" + colorNo + ";" + sizeNo + ";";
//////////////////////////////////// 获取库存 ////////////////////////////////////////////
// 设置:商品包含库存信息
productResponse.setStockFlag(true);
List<ProductSkuStock> productSkuStockList = dynStock.getProductSkuStockList();
if (productSkuStockList == null) {
productSkuStockList = new ArrayList<>();
}
ProductSkuStock productSkuStock = new ProductSkuStock();
// 设置:可用库存值,Zara 未有可用的库存数据
productSkuStock.setSellableQuantity(999);
// 设置:库存对应的id
productSkuStock.setSkuStr(skuStr);
productSkuStockList.add(productSkuStock);
dynStock.setProductSkuStockList(productSkuStockList);
//////////////////////////////////// 获取库存 END/////////////////////////////////////////
//////////////////////////////////// 获取原始价 //////////////////////////////////
OriginalPrice originalPrice = new OriginalPrice();
// 获取商品的原始价
String fullPrice = productDataObj.getJSONObject("price").getString("value");
// TODO 转换汇率,目前商品单位是人民币
fullPrice = exchangeRate(fullPrice);
originalPrice.setPrice(fullPrice);
productResponse.setPrice(fullPrice);
productResponse.setSalePrice(fullPrice + "-" + fullPrice);
originalPrice.setSkuStr(skuStr);
originalPriceList.add(originalPrice);
//////////////////////////////////// 获取原始价 END//////////////////////////////////
}
}
// 按照一下顺序进行 json 数据的填充
productResponse.setPropFlag(true);
productResponse.setProductPropSet(productPropSet);
productResponse.setPlatform(PlatformEnum.MOCO.getValue());
productResponse.setPromotionList(promotionList);
productResponse.setOriginalPriceList(originalPriceList);
productResponse.setItemInfo(itemInfo);
productResponse.setDynStock(dynStock);
return productResponse;
}
} }
...@@ -4,6 +4,7 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum; ...@@ -4,6 +4,7 @@ import com.diaoyun.zion.chinafrica.enums.PlatformEnum;
import com.diaoyun.zion.chinafrica.vo.*; import com.diaoyun.zion.chinafrica.vo.*;
import org.jsoup.Jsoup; import org.jsoup.Jsoup;
import org.jsoup.nodes.Document; import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements; import org.jsoup.select.Elements;
import java.util.*; import java.util.*;
...@@ -59,9 +60,14 @@ public class OchirlySpiderParse { ...@@ -59,9 +60,14 @@ public class OchirlySpiderParse {
List<String> pImgList = colorEle.select("img").eachAttr("src"); List<String> pImgList = colorEle.select("img").eachAttr("src");
// 获取尺码 // 获取尺码
Elements sizeEle = document.select("div[class=size]").select("div[class=size_contain]").select("li"); Elements sizeEle = document.select("div[class=size]").select("div[class=size_contain]").select("li");
List<String> pSizeList = sizeEle.eachText(); List<String> pSizeList = new ArrayList<>();
List<String> pSizeNoList = sizeEle.eachAttr("data-size-id"); List<String> pSizeNoList = new ArrayList<>();
for (Element element : sizeEle) {
if (element.hasAttr("data-size-id")) {
pSizeList.add(element.text());
pSizeNoList.add(element.attr("data-size-id"));
}
}
//////////////////////////////////// 获取商品基本信息 //////////////////////////// //////////////////////////////////// 获取商品基本信息 ////////////////////////////
ItemInfo itemInfo = new ItemInfo(); ItemInfo itemInfo = new ItemInfo();
itemInfo.setShopName(PlatformEnum.OCHIRLY.getLabel()); itemInfo.setShopName(PlatformEnum.OCHIRLY.getLabel());
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论