机器学习recall含义_机器学习的业务含义

机器学习recall含义

by Drew Breunig

通过德鲁·布鲁尼格

机器学习的业务含义 (The Business Implications of Machine Learning)

这与它可以做什么⽆关，⽽是其优先级的影响 (It’s not about what it can do, but the effects of its prioritization)

As buzzwords become ubiquitous they become easier to tune out.

随着流⾏语变得⽆处不在，它们变得更容易调出。

We’ve finely honed this defense mechanism, for good purpose. It’s better to focus on what’s in front of us than the flavor of the week. , but knowing how it works doesn’t help you. VR could eat all media, but it’s from common use.

为了良好的⽬的，我们已经很好地磨练了这种防御机制。最好专注于摆在我们⾯前的事物，⽽不是本周的风格。，但是知道它的⼯作原理对您没有帮助。 VR可以吞噬所有媒体，但是它的普通⽤途。

But please: do not ignore machine learning.

但是请：不要忽视机器学习。

Yes, machine learning will help us build wonderful applications. But that isn’t why I think you should pay attention to it.

是的，机器学习将帮助我们构建出⾊的应⽤程序。但这不是为什么我认为您应该注意它。

You should pay attention to machine learning because it has been prioritized by the companies which drive the technology industry, namely Google, Facebook, and Amazon. The nature of machine learning — how it works, what makes it good, and how it’s delivered — ensures that this strategic prioritization will significantly change the tech industry before even a fraction of machine learning’s value is unleashed.

您应该注意机器学习，因为机器学习已被推动技术产业的公司(即Google，Facebook和Amazon)所重视。机器学习的本质-它如何⼯作，如何使其出⾊以及如何交付-确保这种战略优先级将在甚⾄释放⼀

国家公司⼩部分机器学习价值之前，显着改变技术⾏业。

To understand the impact of machine learning, let’s first explore it’s nature.

要了解机器学习的影响，让我们⾸先探索它的本质。

(I am going to use deep learning and machine learning interchangeably. Forgive me, nerds.)

(我将交替使⽤深度学习和机器学习。请原谅我，书呆⼦。)

机器学习使⼀切成为程序化 (Machine Learning Makes Everything Programmatic)

The goal of machine learning, or deep learning, is to make everything programmatic. :

机器学习或深度学习的⽬标是使所有内容都具有程序性。：

In a nutshell, deep learning is human recognition at computer scale. The first step to create an algorithm is providing a program with lots and lots of data which has been organized by humans, like tagged photos. The program then

analyzes the bits of the raw data and notes patterns which correlate with the human organized data.

The program then looks for these known patterns in the wild. This is how Facebook suggests friends to tag in photos and Google Photos searches by people.

简⽽⾔之，深度学习是计算机规模的⼈类认可。创建算法的第⼀步是为程序提供由⼈类组织的⼤量数据，例如带标签的照⽚。然后，程序分析原始数据的位并记录与⼈类组织的数据相关的模式。然后，程序会在野外寻这些已知模式。 Facebook就是这样建议朋友在⼈中标记照⽚和Google相册搜索的⽅式。

So far, most of the deep learning applications people use are essentially toys: and . These early applications are forgiving. If a learning algorithm misses a face or forces you edit a tricky word, it’s okay (). But as our investment continues and these algorithms become more dependable we’ll see them deployed in more interesting environments, with more interesting use cases.

到⽬前为⽌，⼈们使⽤的⼤多数深度学习应⽤实质上都是玩具：和。这些早期的应⽤是宽容的。如果学习算法错过了⼀张脸或强迫您编辑⼀个棘⼿的单词，就可以了。但是随着我们投资的不断增加以及这些算法变得更加可靠，我们将看到它们部署在更有趣的环境中，并且使⽤了更多有趣的⽤例。

The takeaway here is the machine learning allows companies to build better applications that interact with things people create: pictures, speech, text, and other messy things. This allows compa

nies to create software which understands us. The potential is there to solve the user interface problems that’ve been keeping people from computing since . And major UI advancements tend to kick off major eras of computing.

这⾥要说的是，机器学习使公司可以构建更好的应⽤程序，以与⼈们创建的事物进⾏交互：图⽚，语⾳，⽂本和其他杂乱的事物。这使公司可以创建了解我们的软件。⾃从以来，解决⽤户界⾯问题⼀直是⼈们⽆法进⾏计算的潜⼒。 UI的重⼤改进往往会开启计算的主要时代。

The mouse and graphic interfaces made computers accessible, household objects.

⿏标和图形界⾯使计算机可以⽤作家⽤物品。

Touch interfaces made computers normal, everyday tools.

触摸界⾯使计算机成为⽇常的⽇常⼯具。

Interfaces powered by machine learning will make computing omnipresent. (Eventually)

由机器学习提供⽀持的界⾯将使计算⽆所不在。 (最终)

But there’s a catch:FIKFAP

但是有⼀个陷阱：

机器学习与其培训数据⼀样好 (Machine Learning is Only as Good as its Training Data)

To make a machine learning model you need three things, in order of importance:

要建⽴机器学习模型，您需要满⾜重要性的三件事：

1. Training Data: Data which has been tagged, categorized, or otherwise sorted by humans.

训练数据：已被⼈类标记，分类或以其他⽅式分类的数据。

2. Software: The software library which builds the machine learning models by evaluating training data.

软件：通过评估训练数据来构建机器学习模型的软件库。

3. Hardware: The CPUs and GPUs which run the software’s calculations.

硬件：运⾏软件计算的CPU和GPU。

Hardware is easy enough to acquire. , , whatever.

硬件很容易获得。，，不管。

Software is even easier to acquire! If you rented, you may have accidentally . If not, .

软件更容易获得！如果您租了房⼦，可能已经不⼩⼼。如果没有，。

Now all you need is training data. And lots of it!

现在，您只需要训练数据即可。还有很多！

Good luck.

祝你好运。

Before we get into how exactly screwed you are, let’s first understand why you need so much training data in the first place.

在深⼊了解您的问题之前，⾸先让我们了解⼀下为什么您⾸先需要这么多的培训数据。

Our deep and machine learning software is good. Better than it was! But to work well it requires tons of training data to produce good results. This cannot be overstated: the quality of the models you ma

ke is directly correlated to the quantity and quality of the training data the software intakes. Until we have better software we’re unable to build good models from small datasets. (And when I say “small” I mean, not ginormous.)

我们的深度和机器学习软件很好。⽐以前更好！但是要做好⼯作，需要⼤量的培训数据才能产⽣良好的结果。这不能夸⼤其词：您制作的模型的质量与软件获取的训练数据的数量和质量直接相关。在拥有更好的软件之前，我们⽆法从⼩型数据集中构建良好的模型。 (当我

说“⼩”时，我是说，不是笨拙的。)

Unfortunately, better software is not going to arrive overnight. While most software gets incrementally better, as developers squash bugs week by week, machine learning will likely advance in a fashion: in a few, hard-won, big leaps.

不幸的是，更好的软件不会⼀overnight⽽就。尽管⼤多数软件逐渐变得越来越好，但是随着开发⼈员每周都在解决漏洞，机器学习可能会以⼀种⽅式发展：以⼏次来之不易的巨⼤飞跃。

The reason for this is deep learning software is nearly impossible to debug because we don’t fully understand how it works. To me, this is the weirdest thing about machine learning. We don’t really know what makes it tick. We can’t debug it systematically, we can only guess and check.

造成这种现象的原因是深度学习软件⼏乎不可能调试，因为我们还不完全理解它是如何⼯作的。对我来说，这是关于机器学习的最奇怪的事情。我们真的不知道是什么使它滴答作响。我们不能系统地调试它，我们只能猜测和检查。

Pete Warden, machine learning evangelist extraordinaire, :

机器学习布道⼠Pete Warden ：

Even though the Krizhevsky approach won the 2012 Imagenet competition, nobody can claim to fully understand why it works so well, which design decisions and parameters are most important. It’s a fantastic trial-and-error solution that works in practice, but we’re a long way from understanding how it works in theory. That means that we can expect to see speed and result improvements as researchers gain a better understanding of why it’s effective, and how it can be optimized. As one of my friends put it, , but they’re doing it because the potential payoff is so big.

湖北省经济管理干部学院即使Krizhevsky⽅法在2012年Imagenet竞赛中获胜，也没有⼈可以声称完全理解为什么它如此有效，其中哪些设计决策和参数最为重要。这是在实践中可⾏的出⾊的反复试验解决⽅案，但是距离理论上的⼯作⽅式还有很长的路要⾛。这意味着随着研究⼈员更好地了解它为何有效以及如何对其进⾏优化，我们可以期望看到速度和结果的改进。正如我的⼀位朋友所说，，但他们之所以这样做，是因

为潜在的回报如此之⼤。

Until we understand how deep learning works, we need to make up for its inadequacies with big piles of training data.

在我们了解深度学习的⼯作原理之前，我们需要使⽤⼤量的训练数据来弥补深度学习的不⾜。

Training data is the lifeblood of machine learning.

训练数据是机器学习的命脉。

So how do we get it?

那么我们如何得到它呢？

学习使⽤⽔⽜城的每个部分(或⽤户) (Learning to Use Every Part of the Buffalo (or User))

If computers are to understand messy, human things they need to be taught by messy humans. Makes sense. But when we remember how much data we’re going to need to make our models, we’re faced with a challenge: where are we going to find tons of people willing to spend their spare time to create our training data?

如果计算机要理解凌乱的⼈类事物，则需要由凌乱的⼈类来教它们。说得通。但是，当我们记住建⽴模型所需的数据量时，我们将⾯临⼀个挑战：我们将在哪⾥到愿意花⼤量时间创建训练数据的⼈？

If you said, “I’ll hire them,” I have some bad news. At this scale paying them is pretty much out of the question.

如果您说“我会雇⽤他们”，我有个坏消息。以这样的规模⽀付他们⼏乎是不可能的。

If you said, “I’ll trick them,” you’re getting warmer.

如果您说：“我会欺骗他们”，那么您会越来越热。

A frequent refrain among people who write about the Internet is: “if you’re not paying, you’re the product.” These writers are commenting on ad-supported products — like Facebook, Google , Tumblr, SnapChat, and most everything else online— that package up your attention and sell it to advertisers. But their refrain works just as well for machine learning.

在撰写有关Internet的⼈们中，⼀个常见的说法是：“如果您不付款，那便是产品。” 这些作者对诸如

Facebook，Google，Tumblr，SnapChat等⼴告⽀持的产品以及⼤多数在线其他产品发表评论，这些产品将您的注意⼒集中起来并出售给⼴告商。但是他们的克制对机器学习同样有效。

Users of free services are the humans who will train computers in order to build better products and services. The ‘free’part is crucial because it allows for the massive amounts of users which our data needs require.

免费服务的⽤户是将培训计算机以构建更好的产品和服务的⼈员。 “免费”部分⾄关重要，因为它可以满⾜我们数据需求所需的⼤量⽤户。

All of this makes me think of the old line about Native Americans using every part of the buffalo. Online services are learning how to use more parts of their users. Our attention creates their advertising and our knowledge fuels their deep learning models.

所有这些使我想起了有关美洲印第安⼈使⽤⽔⽜城各个部分的旧思路。在线服务正在学习如何使⽤更多⽤户。我们的注意⼒创造了他们的⼴告，⽽我们的知识则推动了他们的深度学习模型。

The trick to obtaining sufficient training data, then, is twofold. You need to:

模切刀

因此，获得⾜够的训练数据的技巧是双重的。你需要：

1. Attract a bunch of people.出则谦谦以自悔

吸引⼀⼈。

2. Convince them to create your training data.

说服他们创建您的训练数据。

It’s Tom Sawyer and picket fences, just multiplied by several hundred million.

是汤姆·索耶(Tom Sawyer)和栅栏，乘以⼏亿。

互惠数据应⽤(RDA)的兴起 (The Rise of Reciprocal Data Applications (RDAs))

A new category of application, or application feature, has emerged to facilitate your fence painting. These applications are designed to spur the creation of training data as well as deliver the products powered by the data captured. People get better apps and companies get better data.

出现了⼀种新的应⽤程序类别或应⽤程序功能，以⽅便您进⾏栅栏绘画。这些应⽤程序旨在促进培训数据的创建以及交付由捕获的数据提供⽀持的产品。⼈们可以获得更好的应⽤程序，⽽公司可以获得更好的数据。

The clearest example of such a reciprocal data application (or RDA, for short) is Facebook Photos.

这样的互惠数据应⽤程序(简称RDA)最明显的例⼦是Facebook Photos。

Facebook Photos has been designed to prompt viewers to tag people in photos, easily and quickly. A clear call to action frames the faces of your friends and family after uploading an image. Tagging provides clear benefits to you, both for later searching and alerting those tagged in photos. Tagging garners attention and starts a conversation, which (non-coincidentally) are two of the main reasons why people use Facebook.

Facebook Photos旨在提⽰观众轻松，快速地标记照⽚中的⼈物。上传图⽚后，清晰的号召性⽤语会构图您的朋友和家⼈的脸。标记为您提供明显的好处，以便以后搜索和提醒照⽚中标记的⼈。标记引起注意并开始对话，这(偶然)是⼈们使⽤Facebook的两个主要原因。

Meanwhile, all this tagging creates a massive pool of training data which can be used to train machine learning models. With better models, come better tagging suggestions and other features. Thanks to this RDA, .

同时，所有这些标记创建了⼤量的训练数据，可⽤于训练机器学习模型。有了更好的模型，就会有更好的标记建议和其他功能。由于有了RDA，。

Google Search is another RDA. Your searches and selections provide training data to Google, which helps make its search even better.

Google搜索是另⼀个RDA。您的搜索和选择将为Google提供培训数据，从⽽有助于使其搜索效果更好。

Like their other products, both Google Search and Facebook Photos demonstrate how RDAs generate significant . The more people use an app, the more data is generated, the better the app becomes, the more people use the app…闭式引流

与其他产品⼀样，Google搜索和Facebook Photos都展⽰了RDA如何产⽣显着的。使⽤某个应⽤程序的⼈越多，⽣成的数据越多，该应⽤程序变得越好，使⽤该应⽤程序的⼈就越多……

Network effects are the engine needed for venture-backed companies in winner-take-all markets. Previously, the default network effect methods in the Valley was social/chat (you go where your friends are) or marketplaces (sellers go where the buyers are). This is why almost every non-marketplace venture-backed app or service shoehorns in sharing or communication features — even if it didn’t make sense in the app.

⽹络效应是赢家通吃市场中由风险投资⽀持的公司所需要的引擎。以前，Valley中的默认⽹络效应⽅法是社交/聊天(您去朋友的地⽅)或市场(卖⽅去买主的地⽅)。这就是为什么⼏乎所有⾮市场风险投资⽀持的应⽤程序或服务都⽆法共享或通信功能的原因-即使在应⽤程序中没有意义。

RDAs are a new method for creating network effects which is just now becoming understood. As awareness of its business value grows, expect RDAs to propagate throughout the landscape.

RDA是⼀种⽤于创建⽹络效果的新⽅法，⽬前这种⽅法已经为⼈们所了解。随着对其业务价值意识的增强，可以预期RDA会在整个环境中传播。

This propagation of RDAs will be the first major business impact of machine learning. Not only because they’ll divert resources, but because the qualities and requirements of of RDAs will influence the hardware and software which deploy them.

RDA的传播将是机器学习的第⼀个主要业务影响。不仅因为它们会转移资源，⽽且因为RDA的质量和要求将影响部署它们的硬件和软件。

Here are the qualities of a Reciprocal Data Application:

以下是互惠数据应⽤程序的质量：

1. Apps must be networked, preferably always. Otherwise, it cannot send the data it captures back home.

应⽤必须联⽹，最好始终联⽹。否则，它将⽆法将其捕获的数据发送回本地。

2. Nearly all computation takes place off-device. The bulk of computation is the creation of the models, which requires

access to the massive dataset created by all users. Hence, model construction cannot take place on the device.

Comparing new data to computed models (for example, identifying an object or person in a picture or recognizing a spoken phrase) is computationally cheap.

⼏乎所有计算都在设备外进⾏。计算的⼤部分是模型的创建，这需要访问所有⽤户创建的海量数据集。因此，⽆法在设备上进⾏模型构建。将新数据与计算模型进⾏⽐较(例如，识别图⽚中的物体或⼈或识别⼝头短语)在计算上很便宜。

本文发布于:2024-09-23 01:24:54，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/307346.html

上一篇：2021—2022学年上学期期末模拟考试【试题】 (1)

下一篇：岩蔷薇的成分

标签：学习数据机器创建程序训练

留言与评论（共有 0 条评论）