首页 > 学术百科

目标跟踪CVPR,ICCV,ECCV会议年度论文2018

⽬标跟踪CVPR,ICCV,ECCV会议年度论⽂2018

寻⽬标跟踪⽅向的⼩伙伴，如果你苦于没有地⽅可以和同⽅向的⼩伙伴交流，我们创建了⼀个交流，点上⽅链接可以进⼊，每周的交流活动通过该号宣传，⾥随时随地可以展开讨论，⽆论是学术交流，还是环境配置，实验讲解，欢迎加⼊我们，⼀起交流进步！

CVPR 2018

Track检索相关论⽂：

1. GANerated Hands for Real-Time 3D Hand Tracking From Monocular RGB

2. Detect-and-Track: Efficient Pose Estimation in Videos

3. Context-Aware Deep Feature Compression for High-Speed Visual Tracking

4. Correlation Tracking via Joint Discrimination and Reliability Learning

5. Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning

6. A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos

7. End-to-End Flow Correlation Tracking With Spatial-Temporal Attention

8. CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles

9. A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

10. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting With a Single Convolutional

Net

11. Towards Dense Object Tracking in a 2D Honeybee Hive

12. Efficient Diverse Ensemble for Discriminative Co-Tracking

13. Rolling Shutter and Radial Distortion Are Features for High Frame Rate Multi-Camera Tracking

氘灯14. A Twofold Siamese Network for Real-Time Object Tracking

15. Multi-Cue Correlation Filters for Robust Visual Tracking

16. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

17. SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation

18. High-Speed Tracking With Multi-Kernel Correlation Filters

19. Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

20. WILDTRACK: A Multi-Camera HD Dataset for Dense Unscripted Pedestrian Detection

21. PoseTrack: A Benchmark for Human Pose Estimation and Tracking

22. Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes

23. Features for Multi-Target Multi-Camera Tracking and Re-Identification

24. MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses

25. Tracking Multiple Objects Outside the Line of Sight Using Speckle Imaging

26. Fast and Accurate Online Video Object Segmentation via Tracking Parts

27. Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

28. Learning Spatial-Aware Regressions for Visual Tracking

29. High Performance Visual Tracking With Siamese Region Proposal Network

30. VITAL: VIsual Tracking via Adversarial Learning

巨磁阻传感器

翻译：

1.GANerated Hands⽤于从单⽬RGB进⾏实时3D⼿部跟踪

2.检测和跟踪：视频中的⾼效姿态估计

3.⽤于⾼速视觉跟踪的上下⽂感知深度特征压缩

4.通过联合歧视和可靠性学习进⾏相关跟踪

5.连续深度Q学习跟踪的超参数优化

6.⽆约束视频中的多优先级跟踪⽅法

7.具有时空关注的端到端流量相关跟踪

8. CarFusion：结合点跟踪和零件检测进⾏车辆动态三维重建

9.跟踪交互对象中可见性流畅推理的因果和图形模型

10.速度与激情：使⽤单⼀卷积⽹实时进⾏端到端3D检测，跟踪和运动预测

11.在2D蜜蜂蜂巢中进⾏密集物体跟踪

12.⽤于判别共同跟踪的⾼效多样化集合

13.滚动快门和径向失真是⾼帧率多相机跟踪的特征

14.⽤于实时⽬标跟踪的双重暹罗⽹络

15.⽤于鲁棒视觉跟踪的多线索相关滤波器

16.学习注意事项：⽤于⾼性能在线视觉跟踪的残留注意暹罗⽹络

17. SINT ++：通过对抗性正实例⽣成进⾏稳健的视觉跟踪

18.具有多核相关滤波器的⾼速跟踪

19.学习⽤于视觉跟踪的时空正则化相关滤波器

献身艺术献身爱情

20. WILDTRACK：⽤于密集的⾮脚本⾏⼈检测的多摄像机HD数据集

21. PoseTrack：⼈体姿势估计和跟踪的基准

22.融合⼈密度图和视觉对象跟踪器，⽤于⼈场景中的⼈物跟踪

23.多⽬标多相机跟踪和重新识别的功能

24. MX-LSTM：混合⼩轨和Vislets共同预测轨迹和头部姿势

25.使⽤散斑成像跟踪视线外的多个物体

26.通过跟踪部件进⾏快速准确的在线视频对象分割

27.总捕获：⽤于跟踪⾯部，⼿部和⾝体的3D变形模型

28.学习视觉跟踪的空间感知回归

29.具有暹罗地区提案⽹络的⾼性能视觉跟踪

30. VITAL：通过对抗性学习进⾏虚拟跟踪

深度学习相关论⽂：

5. Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning

6. End-to-End Flow Correlation Tracking With Spatial-Temporal Attention

7. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting With a Single Convolutional

Net

8. A Twofold Siamese Network for Real-Time Object Tracking

9. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

10. SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation

11. Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes

12. Fast and Accurate Online Video Object Segmentation via Tracking Parts

13. Learning Spatial-Aware Regressions for Visual Tracking

14. High Performance Visual Tracking With Siamese Region Proposal Network

15. VITAL: VIsual Tracking via Adversarial Learning

翻译：

5.连续深度Q学习跟踪的超参数优化

7.具有时空关注的端到端流量相关跟踪

10.速度与激情：使⽤单⼀卷积⽹实时进⾏端到端3D检测，跟踪和运动预测

14.⽤于实时⽬标跟踪的双重暹罗⽹络

16.学习注意事项：⽤于⾼性能在线视觉跟踪的残留注意暹罗⽹络

17. SINT ++：通过对抗性正实例⽣成进⾏稳健的视觉跟踪

22.融合⼈密度图和视觉对象跟踪器，⽤于⼈场景中的⼈物跟踪

26.通过跟踪部件进⾏快速准确的在线视频对象分割

27.学习视觉跟踪的空间感知回归

28.具有暹罗地区提案⽹络的⾼性能视觉跟踪

29. VITAL：通过对抗性学习进⾏虚拟跟踪

摘要：

- Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning

Hyperparameters are numerical presets whose values are assigned prior to the commencement of the learning process. Selecting appropriate hyperparameters is critical for the accuracy of tracking algorithms, yet it is difficult to determine their optimal values, in particular, adaptive ones for each specific video sequence. Most hyperparameter optimization algorithms depend on searching a gener

莫里森公式ic range and they are imposed blindly on all sequences. Here, we propose a novel hyperparameter optimization method that can find optimal hyperparameters for a given sequence using an action-prediction network leveraged on Continuous Deep Q-Learning. Since the common state-spaces for object tracking tasks are significantly more complex than the ones in traditional control problems, existing Continuous Deep Q-Learning algorithms cannot be directly applied. To overcome this challenge, we introduce an efficient heuristic to accelerate the convergence behavior. We evaluate our method on several tracking benchmarks and demonstrate its superior performance.

超参数是数字预设，其值在学习过程开始之前分配。选择适当的超参数对于跟踪算法的准确性⾄关重要，但是很难确定它们的最佳值，特别是每个特定视频序列的⾃适应值。⼤多数超参数优化算法依赖于搜索通⽤范围，并且盲⽬地强加于所有序列。在这⾥，我们提出了⼀种新的超参数优化⽅法，该⽅法可以使⽤连续深度Q学习的动作预测⽹络到给定序列的最佳超参数。由于对象跟踪任务的公共状态空间⽐传统控制问题复杂得多，因此不能直接应⽤现有的连续深度Q学习算法。为了克服这⼀挑战，我们引⼊了⼀种有效的启发式⽅法来加速收敛⾏为。我们在⼏个跟踪基准上评估我们的⽅法，并展⽰其卓越的性能。

- End-to-End Flow Correlation Tracking With Spatial-Temporal Attention

人员定位系统

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks. However, most of existing DCF trackers only consider appearance features of current frame, and hardly benefit from motion and inter-frame information. The lack of temporal information degrades the tracking performance during challenges such as partial occlusion and deformation. In this paper, we propose the FlowTrack, which focuses on making use of the rich flow information in consecutive frames to improve the feature representation and the tracking accuracy. The FlowTrack formulates individual components, including optical flow estimation, feature extraction, aggregation and correlation filters tracking as special layers in network. To the best of our knowledge, this is the first work to jointly train flow and tracking task in deep learning framework. Then the historical feature maps at predefined intervals are warped and aggregated with current ones by the guiding of flow. For adaptive aggregation, we propose a novel spatial-temporal attention mechanism. In experiments, the proposed method achieves leading performance on OTB2013,

OTB2015, VOT2015 and VOT2016.

具有深度卷积特征的判别相关滤波器（DCF）在最近的跟踪基准中已经获得了有利的性能。然⽽，⼤多数现有的DCF跟踪器仅考虑当前帧的外观特征，并且⼏乎不受运动和帧间信息的影响。在诸如部分

遮挡和变形的挑战期间，缺乏时间信息会降低跟踪性能。在本⽂中，我们提出了FlowTrack，它专注于利⽤连续帧中的丰富流量信息来改善特征表⽰和跟踪精度。 FlowTrack制定了各个组件，包括光流估计，特征提取，聚合和相关滤波器跟踪作为⽹络中的特殊层。据我们所知，这是在深度学习框架中联合培训流程和跟踪任务的第⼀项⼯作。然后，通过引导流动，以预定间隔对历史特征图进⾏扭曲和聚合。对于⾃适应聚合，我们提出了⼀种新颖的时空关注机制。在实验中，所提出的⽅法在OTB2013，OTB2015，VOT2015和VOT2016上实现了领先的性能。

- Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting With a Single Convolutional Net

In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more

robust to occlusion as well as sparse data at range. Our approach performs 3D convolutions across space and time over a bird’s eye view representation of the 3D world, which is very efficient in terms of both memory and computation. Our experiments on a new very large scale dataset captured in several north american cities, show that we can outperform the state-of-the-art by a large margin. Importantly, by sharing computation we can perform all tasks in as little as 30 ms.

在本⽂中，我们提出了⼀种新的深度神经⽹络，能够在给定3D传感器捕获的数据的情况下共同推理3D检测，跟踪和运动预测。通过共同推理这些任务，我们的整体⽅法对于遮挡以及范围内的稀疏数据更加稳健。我们的⽅法在空间和时间上对3D世界的鸟瞰图表⽰执⾏3D卷积，这在内存和计算⽅⾯都⾮常有效。我们在北美⼏个城市捕获的⼀个新的超⼤规模数据集上的实验表明，我们可以⼤幅超越最先进的技术⽔平。重要的是，通过共享计算，我们可以在短短30毫秒内执⾏所有任务。

- A Twofold Siamese Network for Real-Time Object Tracking

Observing that Semantic features learned in an image classification task and Appearance features learned in a similarity matching task complement each other, we build a twofold Siamese network, named SA-Siam, for real-time object tracking. SA-Siam is composed of a semantic branch and an appearance branch. Each branch is a similarity learning Siamese network. An important design choice in SA-Siam is to separately train the two branches to keep the heterogeneity of the two types of features. In addition, we propose a channel attention mechanism for the semantic branch. Channel-wise weights are computed according to the channel activations around the target position. While the inherited architecture from SiamFC allows our tracker to operate beyond real-time, the twofold design and the attention mechanism significantly improve the tracking performance. The proposed SA-Siam outperforms all other real-time trackers by a large margin on OTB-

2013/50/100 benchmarks.

观察在图像分类任务中学习的语义特征和在相似性匹配任务中学习的外观特征相互补充，我们构建了⼀个双重的连体⽹络，名为SA-Siam，⽤于实时对象跟踪。 SA-Siam由语义分⽀和外观分⽀组成。每个分⽀都是⼀个相似性学习的Siamese⽹络。 SA-Siam的⼀个重要设计选择是分别训练两个分⽀以保持两种类型特征的异质性。另外，我们提出了语义分⽀的通道关注机制。根据⽬标位置周围的信道激活来计算信道⽅向权重。虽然SiamFC的继承架构允许我们的跟踪器实时运⾏，但双重设计和注意机制显着提⾼了跟踪性能。建议的SA-Siam在OTB-2013/50/100基准测试中⼤幅优于所有其他实时跟踪器。

- Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

Offline training for object tracking has recently shown great potentials in balancing tracking accuracy and speed. However, it is still difficult to adapt an offline trained model to a target tracked online. This work presents a Residual Attentional Siamese Network (RASNet) for high performance object tracking. The RASNet model reformulates the correlation filter within a Siamese tracking framework, and introduces different kinds of the attention mechanisms to adapt the model

without updating the model online. In particular, by exploiting the offline trained general attention, the target adapted residual attention, and the channel favored feature attention, the RASNet not only mitigates the over-fitting problem in deep network training, but also enhances its discriminative capacity and adaptability due to the separation of representation learning and discriminator learning. The proposed deep architecture is trained from end to end and takes full advantage of the rich spatial temporal information to achieve robust visual tracking. Experimental results on two latest benchmarks, OTB-2015 and VOT2017, show that the RASNet tracker has the state-of-the-art tracking accuracy while runs at more than 80 frames per second.

对象跟踪的离线训练最近在平衡跟踪精度和速度⽅⾯显⽰出巨⼤潜⼒。但是，仍然难以使离线训练的模型适应在线跟踪的⽬标。这项⼯作提出了⼀个残留的注意连体⽹络（RASNet），⽤于⾼性能对象跟踪。 RASNet模型在Siamese跟踪框架内重新构造相关过滤器，并引⼊不同类型的注意机制来适应模型⽽⽆需在线更新模型。特别是，通过利⽤离线训练的⼀般注意⼒，⽬标适应剩余注意⼒，并且通道有利于特征注意，RASNet不仅减轻了深度⽹络训练中的过度拟合问题，⽽且还增强了其辨别能⼒和适应性。表征学习和鉴别学习的分离。所提出的深层体系结构从端到端进⾏训练，并充分利⽤丰富的空间时间信息来实现稳健的视觉跟踪。两项最新基准测试OTB-2015和VOT2017的实验结果表明，RASNet跟踪器具有最先进的跟踪精度，同时运⾏速度超过每秒80帧。

SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation

Existing visual trackers are easily disturbed by occlusion,blurandlargedeformation. Inthechallengesofocclusion, motion blur and large object deformation, the performance of existing visual trackers may be limited due to the followingissues:

i)Adoptingthedensesamplingstrategyto generate positive examples will make them less diverse; ii) Thetrainingdatawithdifferentchallengingfactorsarelimited, even though through collecting large training dataset. Collecting even larger training dataset is the most intuitive paradigm, but it may still can not cover all situations and the positive samples are still monotonous. In this paper, we propose to generate hard positive samples via adversarial learning for visual tracking. Speciﬁcally speaking, we assume the target objects all lie on a manifold, hence, we introduce the positive samples generation network (PSGN) to sampling massive diverse training data through traversing over the constructed target object manifold. The generated diverse target object images can enrich the training dataset and enhance the robustness of visual trackers. To make the tracker more robust to occlusion, we adopt the hard positive transformation network (HPTN) which can generate hard samples for tracking algorithm to recognize. We train this network with deep reinforcement learning to automaticallyoccludethetargetobjectwithanegativepatch. Based on the generated hard positive sa

mples, we train a Siamese network for visual tracking and our experiments validate the effectiveness of the introduced algorithm.

现有的视觉跟踪器很容易受到遮挡，钝⾓和⼤变形的⼲扰。由于以下问题，现有视觉跟踪器的性能受到影响，现有视觉跟踪器的性能可能受到限制：i）采⽤衰减采样策略来⽣成正例将使它们变得不那么多样化; ii）即使通过收集⼤型训练数据集，具有不同挑战因素的训练数据也是有限的。收集更⼤的训练数据集是最直观的范例，但它可能仍然⽆法涵盖所有情况，⽽正⾯样本仍然是单调的。在本⽂中，我们建议通过对抗性学习⽣成硬性阳性样本⽤于视觉跟踪。具体⽽⾔，我们假设⽬标对象都位于流形上，因此，我们引⼊正样本⽣成⽹络（PSGN），通过遍历构造的⽬标对象流形对⼤量不同的训练数据进⾏采样。⽣成的各种⽬标对象图像可以丰富训练数据集并增强视觉跟踪器的鲁棒性。为了使跟踪器对遮挡更加鲁棒，我们采⽤硬正转换⽹络（HPTN），它可以⽣成⽤于跟踪算法识别的硬样本。我们通过深度强化学习来训练这个⽹络，以⾃动包括对象的对象。基于⽣成的硬阳性样本，我们训练了⼀个⽤于视觉跟踪的连体⽹络，我们的实验验证了所引⼊算法的有效性。

- Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes

While people tracking has been greatly improved over the recent years, crowd scenes remain particularly challenging for people tracking due to heavy occlusions, high crowd density, and significa

nt appearance variation. To address these challenges, we first design a Sparse Kernelized Correlation Filter (S-KCF) to suppress target response variations caused by occlusions and illumination changes, and spurious responses due to similar distractor objects. We then propose a people tracking framework that fuses the S-KCF response map with an estimated crowd density map using a convolutional neural network (CNN), yielding a refined response map. To train the fusion CNN, we propose a two-stage strategy to gradually optimize the parameters. The first stage is to train a preliminary model in batch mode with image patches selected around the targets, and the second stage is to fine-tune the preliminary model using the real frame-by-frame tracking process. Our density fusion framework can significantly improves people tracking in crowd scenes, and can also be combined with other trackers to improve the tracking performance. We validate our framework on two crowd video datasets: UCSD and

PETS2009.

虽然近年来⼈们的跟踪得到了极⼤的改善，但由于严重的闭塞，⾼⼈密度和显着的外观变化，⼈追踪对于追踪⼈员仍然特别具有挑战性。为了应对这些挑战，我们⾸先设计⼀个稀疏核化相关滤波器（S-KCF）来抑制由遮挡和光照变化引起的⽬标响应变化，以及由于类似的⼲扰物对象引起的虚假响应。然后，我们提出了⼀种⼈员跟踪框架，该框架使⽤卷积神经⽹络（CNN）将S-KCF响应图与估计

氏族社会的⼈密度图融合，产⽣精确的响应图。为了训练融合CNN，我们提出了⼀个逐步优化参数的两阶段策略。第⼀阶段是以批处理模式训练初步模型，在⽬标周围选择图像块，第⼆阶段是使⽤真实的逐帧跟踪过程微调初步模型。我们的密度融合框架可以显着改善⼈场景中的⼈物跟踪，还可以与其他跟踪器结合使⽤以提⾼跟踪性能。我们在两个⼈视频数据集上验证我们的框架：UCSD和PETS2009。

- Fast and Accurate Online Video Object Segmentation via Tracking Parts

本文发布于:2024-09-23 18:33:07，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/417360.html

上一篇：商贸零售行业2022年10月投资策略：关注双11电商旺季消费，把握头部美妆布局机遇

下一篇：基于眼球追踪传感技术的自动对焦技术

标签：跟踪学习视觉训练对象特征数据

留言与评论（共有 0 条评论）