电商平台分析平台----需求三:热门top10商品

电商平台分析平台----需求三:热门top10商品
做什么?
在符合条件的⽤户⾏为数据中,获取点击下单和⽀付数量排名前10的品类。在Top10的排序中,按照点击数量、下单数量、⽀付数量的次序进⾏排序,即优先考虑点击数量。
需求分析
⾸先我们想要得到的是在符合条件的action中,统计排名前⼗的热门商品.并且排名的依据是根据点击数量、下单数量、⽀付数量的次序进⾏排序的.所以通过逆推:
top10商品–>(id,(clickCount=83|orderCount=67|payCount=63))------>分别统计(id,clickCount=…),(id,orderCount=…)--------------->需要得到符合条件的原始数据
步骤分析
1. 得到符合需求⼀中过滤条件的原始数据—join算⼦
val OriActionRDD(session,task);
val sessionId2ActionRDD = actionRdd.map{
item =>(item.session_id, item)
}
val sessionId2FilterActionRDD=sessionId2ActionRDD.join(FilterInfo).map {
case(sessionId,(action,info))=>{
(sessionId,action);
}
}
2. 获取所有发⽣过点击、下单、⽀付⾏为的categoryId
var cid2CidRdd=sessionId2FilterActionRDD.flatMap{
case(sessionId,action: UserVisitAction)=>{
val categoryBuffer=new ArrayBuffer[(Long,Long)]();
// 点击⾏为
if(action.click_category_id !=-1){
categoryBuffer +=((action.click_category_id, action.click_category_id))
}else der_category_ids != null){
for(orderCid <- der_category_ids.split(","))
categoryBuffer +=((Long, Long))
}else if(action.pay_category_ids != null){
for(payCid <- action.pay_category_ids.split(","))
categoryBuffer +=((Long, Long))
}
categoryBuffer
}
}
cid2CidRdd=cid2CidRdd.distinct();
3. 分别统计点击、下单、⽀付⾏为的数量:
// 第⼆步:统计品类的点击次数、下单次数、付款次数
val cid2ClickCountRDD =getClickCount(sessionId2FilterActionRDD)
val cid2OrderCountRDD =getOrderCount(sessionId2FilterActionRDD)
val cid2PayCountRDD =getPayCount(sessionId2FilterActionRDD)
def getClickCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={
val clickFilterRDD=sessionId2FilterActionRDD.filter{
case(sessionId,action: UserVisitAction)=>{
action.click_category_id !=-1L;
}
}
val clickNumRDD = clickFilterRDD.map{
case(sessionId, action)=>(action.click_category_id,1L)
}
}
def getOrderCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={
val orderFilterRDD=sessionId2FilterActionRDD.filter(item=>item._2.order_category_ids!=null)
val orderNumRDD=orderFilterRDD.flatMap{
case(sessionId,action)=>{
for(id<-der_category_ids.split(",")){
}
}
}
}
def getPayCount(sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={
val payFilterRDD = sessionId2FilterActionRDD.filter(item => item._2.pay_category_ids != null)
val payNumRDD = payFilterRDD.flatMap{
case(sid, action)=>
action.pay_category_ids.split(",").map(item =>(Long,1L))
}大型浮雕
}
4. ⽤左连接算⼦,统计总的数据,最后格式为:categoryId,str,str代表总的数据:(clickCount=83|orderCount=67|payCount=63)
def getFullCount(cid2CidRDD: RDD[(Long, Long)], cid2ClickCountRDD: RDD[(Long, Long)], cid2OrderCountRDD: RDD[(Long, Long)], cid2PayCountRDD: RDD[(Long, Long)])={
val cid2ClickInfoRDD=cid2CidRDD.leftOuterJoin(cid2ClickCountRDD).map{
case(cId,(categoryId,option))=>{
val clickCount=if(option.OrElse(0);
val aggrCount = Constants.FIELD_CATEGORY_ID +"="+ cId +"|"+
Constants.FIELD_CLICK_COUNT +"="+ clickCount
(cId, aggrCount)
}
}
val cid2OrderInfoRDD = cid2ClickInfoRDD.leftOuterJoin(cid2OrderCountRDD).map{
case(cid,(clickInfo, option))=>
val orderCount =if(option.isDefined) else0
val aggrInfo = clickInfo +"|"+
Constants.FIELD_ORDER_COUNT +"="+ orderCount
(cid, aggrInfo)
}
val cid2PayInfoRDD = cid2OrderInfoRDD.leftOuterJoin(cid2PayCountRDD).map{
case(cid,(orderInfo, option))=>
val payCount =if(option.isDefined) else0
val aggrInfo = orderInfo +"|"+
Constants.FIELD_PAY_COUNT +"="+ payCount
(cid, aggrInfo)
}
cid2PayInfoRDD;
}
5. ⾃定义排序器,将数据转化为(sortKey,info)格式,接着⽤sorkByKey及逆⾏排序
⾃定义排序:
package server
case class SortKey(clickCount:Long, orderCount:Long, payCount:Long)extends Ordered[SortKey]{
// thispare(that)
// this compare that
// compare > 0  this > that
// compare <0    this < that
override def compare(that: SortKey): Int ={
if(this.clickCount - that.clickCount !=0){
return(this.clickCount - that.clickCount).toInt
}else derCount - derCount !=0){
什么是宪法
derCount - derCount).toInt
}else{
return(this.payCount - that.payCount).toInt
}
}
}
val sortRDD=cid2FullCountRDD.map{
case(cId,info)=>{
val clickCount = FieldFromConcatString(info,"\\|", Constants.FIELD_CLICK_COUNT).toLong        val orderCount = FieldFromConcatString(info,"\\|", Constants.FIELD_ORDER_COUNT).toLong        val p
ayCount = FieldFromConcatString(info,"\\|", Constants.FIELD_PAY_COUNT).toLong
val sortKey =SortKey(clickCount, orderCount, payCount)
(sortKey, info)
}
}
//5.排序
val top10=sortRDD.sortByKey(false).take(10);
6. 数据封装,写⼊数据库
//6.封装数据,写进数据库
val top10CategoryRDD = sparkSession.sparkContext.makeRDD(top10).map{
case(sortKey, countInfo)=>
欺实马val cid = FieldFromConcatString(countInfo,"\\|", Constants.FIELD_CATEGORY_ID).toLong
val clickCount = sortKey.clickCount
val orderCount = derCount
val payCount = sortKey.payCount
Top10Category(taskUUID, cid, clickCount, orderCount, payCount)
}
//保存到数据库
/* import sparkSession.implicits._
.format("jdbc")
.option("url", String(Constants.JDBC_URL))
.option("user", String(Constants.JDBC_USER))
.option("password", String(Constants.JDBC_PASSWORD))
.option("dbtable", "top10_category_0308")
.mode(SaveMode.Append)
.save*/
完整代码:
package server
stant.Constants
del.{Top10Category, UserVisitAction}
import commons.utils.StringUtil
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
llection.mutable.ArrayBuffer
class serverThree  extends Serializable {
def top10PopularCategories(sparkSession: SparkSession,
taskUUID: String,
sessionId2FilterActionRDD: RDD[(String, UserVisitAction)])={
//1.将所有基本数据,转化成(cId,cId)格式的总数据
var cid2CidRdd=sessionId2FilterActionRDD.flatMap{
case(sessionId,action: UserVisitAction)=>{
val categoryBuffer=new ArrayBuffer[(Long,Long)]();
// 点击⾏为
if(action.click_category_id !=-1){
categoryBuffer +=((action.click_category_id, action.click_category_id))
}else der_category_ids != null){
for(orderCid <- der_category_ids.split(","))
categoryBuffer +=((Long, Long))
categoryBuffer +=((Long, Long))
}else if(action.pay_category_ids != null){
for(payCid <- action.pay_category_ids.split(","))
categoryBuffer +=((Long, Long))
}
categoryBuffer
}
}
cid2CidRdd=cid2CidRdd.distinct();
// 第⼆步:统计品类的点击次数、下单次数、付款次数
val cid2ClickCountRDD =getClickCount(sessionId2FilterActionRDD)
val cid2OrderCountRDD =getOrderCount(sessionId2FilterActionRDD)
val cid2PayCountRDD =getPayCount(sessionId2FilterActionRDD)
凝胶谱法
//3.根据左连接,将总的数据cid2CidRdd和第⼆部得到的数据⼀个个进⾏连接,创造出cid:str
//其中,str代表count=32|
val cid2FullCountRDD =getFullCount(cid2CidRdd,cid2ClickCountRDD,cid2OrderCountRDD,cid2PayCountRDD);
//4.⾃定义排序器,将数据转化为(sortKey,info)
val sortRDD=cid2FullCountRDD.map{
case(cId,info)=>{
val clickCount = FieldFromConcatString(info,"\\|", Constants.FIELD_CLICK_COUNT).toLong
val orderCount = FieldFromConcatString(info,"\\|", Constants.FIELD_ORDER_COUNT).toLong
val payCount = FieldFromConcatString(info,"\\|", Constants.FIELD_PAY_COUNT).toLong
val sortKey =SortKey(clickCount, orderCount, payCount)
(sortKey, info)
}
}
/
/5.排序
val top10=sortRDD.sortByKey(false).take(10);
top10.foreach(println);
//6.封装数据,写进数据库
val top10CategoryRDD = sparkSession.sparkContext.makeRDD(top10).map{
case(sortKey, countInfo)=>
val cid = FieldFromConcatString(countInfo,"\\|", Constants.FIELD_CATEGORY_ID).toLong
val clickCount = sortKey.clickCount
val orderCount = derCount
val payCount = sortKey.payCount
Top10Category(taskUUID, cid, clickCount, orderCount, payCount)
}
//保存到数据库
/* import sparkSession.implicits._
.format("jdbc")工业区位论
.option("url", String(Constants.JDBC_URL))
.option("user", String(Constants.JDBC_USER))
.option("password", String(Constants.JDBC_PASSWORD))
.option("dbtable", "top10_category_0308")
.mode(SaveMode.Append)
.save*/
top10
}
def getFullCount(cid2CidRDD: RDD[(Long, Long)], cid2ClickCountRDD: RDD[(Long, Long)], cid2OrderCountRDD: RDD[(Long, Long)], cid2PayCountRDD: RDD[(Long, Long)])={
val cid2ClickInfoRDD=cid2CidRDD.leftOuterJoin(cid2ClickCountRDD).map{
case(cId,(categoryId,option))=>{
高效液相谱法原理val clickCount=if(option.OrElse(0);
val aggrCount = Constants.FIELD_CATEGORY_ID +"="+ cId +"|"+
Constants.FIELD_CLICK_COUNT +"="+ clickCount
(cId, aggrCount)

本文发布于:2024-09-22 04:13:15,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/453155.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:数据   点击   次数   数量   下单   得到
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议