2012年1月 のアーカイブ

DSPはあなたにとって何を意味する? (和訳)

2012年1月5日 木曜日



What Does ‘Demand-Side Platform’ Mean To You?

What does DSP (demand-side platform) mean to you?

JG: Let me start by saying that one of the unfortunate things about ad technology is that we ruin words. I was in the middle of this before when I created the first ad exchange (AdECN) in 2004. We went out into the world and said, “We’ve created an ad exchange and we’re fair, neutral and transparent – and we run an auction for every impression.”

Then suddenly, that idea took hold. Before I knew it, there was no longer any distinctions (at least in powerpoint) between ad networks and ad exchanges.  And nearly every ad network transformed into either an exchange or a marketplace, or something other than an ad network. The word “exchange” started to become diluted. We then had to open every sales conversation with, “Let me tell you about our type of exchange.” We would oscillate between saying, “Our competitors are not exchanges,” to “They’re a different kind of exchange.”

So, in 2012 it is now 2006 all over again, except that instead of us squabbling about the term “ad exchange,” we’re diluting the term DSP, which has come to mean something other than a demand-side platform. Instead it’s come to mean,  “I plug into RTB (real-time bidding) inventory and I get access to the exchanges.”

There are often a variety of business models built on top of that – such as arbitrage focused business that are operating on 50 percent margins and offering little transparency – which I would argue is a business model that is closer to an ad network.

I’d argue that more often than not, most so-called DSPs are an outsourced agency, or they are a new type of ad network. Some DSPs have more service layers than they do technology—and their margins, their staffing, and sales pitch certainly do not lead with “platform.”

Anyway, it is admittedly self-serving to make this statement, but we’re one of the few companies that can actually say we’re a platform. We are a demand-side platform, or a buy-side platform. In fact, we often refer to ourselves as a buyer’s platform just because we’re trying to distinguish ourselves from the term DSP, which has come to mean so many different things.

As a buyer’s platform, we operate transparently. We leave a gap in the stack so that agencies and ad networks create the service layer – and can create and protect our client’s proprietary advantage when they play on our platform. We are trying to be the ad tech company that powers other ad tech companies instead of trying to compete with them all, which is different.  Most DSPs are competing with ad networks and ad agencies.  We don’t compete with them; we power them.











「私はRTB (real-time bidding) でインベントリに接続します。そしてエクスチェンジにアクセスします。」


我々は、「demand-side platform」または「buy-side platform」であるが、実際に、しばしば私たち自身を

「buyer’s platform」と呼びます。


buyer’s platformとして、我々は透過的に動作します。代理店やアドネットワークは、サービス層を作成するような構造としている。我々のプラットフォームで動作させると、クライアントの独自の優位性を作成し、保護することができます 。私たちは、他社と競争しようとする代わりに、他のアドテクノロジー企業に動力を供給するようなアドテクノロジ企業であるとなろうとしています。そこが違います。ほとんどのDSPは広告ネットワークおよび広告代理店と競争していますが、私たちは競合しません。




What’s your thought on Google’s position right now in the marketplace? Is it a fait accompli? Have they got it wrapped up?

No, and I know at times I’ve somewhat been in the minority on this. It’s not that we shouldn’t keep our eye on Google, but display and search are different. People often want to apply what’s happened in search to display. It just can’t happen that way. The primary reason it can’t happen that way is that in search, Google is the publisher 75% of the time.  In display, they rarely are the publisher. The argument that people often make is that Google has millions of advertisers plugged into them on the demand side, and they’ve got millions of publishers on the supply side and, as a result, the display ecosystem is theirs to lose. Unfortunately, most of the volume doesn’t come from the long tail and they have to continue to pay well.

The Admeld acquisition I think was a brilliant move for Google. But there is also risk. If Google takes too much margin –which is what’s required in order to really move the needle for investors accustomed to the high margins Google makes in search — than those publishers will eventually end up at other ad exchanges.

If Google decides to subsidize the business and accept lower margins, they may be able to sustain what they have. But I don’t see Google ever having more display market share than they have today.   Every major publisher is going to fight against Google growing in display.

That said, I’d like to see some of the other could-be-competitors actually compete with Google.






私が思うadmeld買収は、Googleのための華麗な動きだった。 しかし、リスクもあります。Googleがあまりにも多くのマージン(サーチで慣れている高いマージン)を取る場合 、パブリッシャーは他のアドエクスチェンジでそれを終了させるでしょう。Googleは低マージンを受け入れる必要があると判断した場合、彼らが持っているもの維持できる可能性があります。しかし、私は、グーグルが、それらが今日持つより多くのディスプレイ市場占有率を常に持つとは思えません。すべての主要なパブリッシャーははディスプレイで成長するグーグルとの争うことになるでしょう。


(関連キーワード: ad teck / DSP/ ad exchange / RTB )


Learn about online behavioral advertising, privacy, cookies, and how this all works! (NAIのリンク集)

2012年1月4日 水曜日



Learn about online behavioral advertising, privacy, cookies, and how this all works!



A way to support the websites and products you care about



Advertising supports most of the free content people enjoy viewing online.

Allowing cookies and online advertising to personalize services to your browser can be a great benefit to the websites you support, by helping them earn revenue.

It can also be a great benefit to you by helping ensure that you don’t see too much repetition or irrelevant advertising while you surf the web. To read more about the value of advertising, you can check out the following:








On Cookies



What are http cookies?How are they used? How can I control what cookies are set on my computer? This and much more, as you learn about cookies via the following links:




On Regulation, Self-Regulation & Accountability



Online privacy and behavioral advertising are hot topics! Read what regulators and business groups are thinking about these issues:


From business groups, technologists, and self-regulatory bodies:




(関連キーワード: NAI, FTC, HTTPクッキー, iAB)

What are the best similarity search engines? (Quora和訳)

2012年1月3日 火曜日



What are the best similarity search engines?


How do you find people with similar interests on the web (not limited to FB, Quora, LinkedIn), How do you find similar cars, music, videos, clothes, images, products, books (not only on Amazon), electronic parts, web sites, related academic papers, stocks, homes for sale, service providers etc

あなたは、ウェブ上(FB、quora、LinkedInに限定されない)で似たような関心を持つ人々をどのように見つけるのですか 。



You can cluster queries in user query logs to suggest similar queries. I suggest to look at Microsoft researcher Ji-Rong Wen work.

ユーザーのクエリによるクラスタクエリは、類似クエリを推薦する。 私はMicrosoftの研究者の論文をお勧めします。



I assume you are talking about similarity search of images.


It is a popular image matching site (frequented by quizzers like me anyway). It checks for proper copies of images and its database has grown significantly since its launch.



それは一般的なイメージのマッチングサイトです。 それは画像の適切なコピーをチェックし、そのデータベースは、ローンチ以降大幅に成長してきました。

What is a good way to create an item-item similarity matrix for a recommendation engine where items aren’t actually rated by users, but rather “used”, “clicked”, “bought” or “played” by users? (Quora和訳)

2012年1月2日 月曜日


What is a good way to create an item-item similarity matrix for a recommendation engine where items aren’t actually rated by users, but rather “used”, “clicked”, “bought” or “played” by users?




This has to do with Collaborative Filtering. Looks like many algorithms assume users have given ratings for the items. I think there are countless scenarios where users do not “rate” an item on some scale, but their behavior provides a “rating” which is just as valuable. I’m looking to figure out how to calculate similarity between items in this scenario.





 I’ve been using the Jaccard Coefficient, and specifically, the Tanimoto Coefficient, both described at http://en.wikipedia.org/wiki/Jaccard_index to calculate item-item similarity. They are both measures of overlap.

The formula is

AB / ( A + B – AB)

Where AB is the number of times both items were rated(bought) together, A is the number of times item A was bought, and B is the number of times item B was bought.

How I calculate this, in a map-reduce friendly way:
For each user, generate all of the (itemA, itemB) pairs for all of the items bought, and then keep track of the number of occurrences of each item-item pair.

The hard part is determining AB for each item-item pair. Once you have that figured out, calculating the Tanimoto coefficient is easy(refer to formula).

Consider this simplified example:
Customer A bought items 1, 2, and 3
Customer B bought items 2, 3, 4, and 6
Customer C bought items 1, 2 and 5

Items 1 and 2 would be considered most similar because they were bought together most often, compared to the number of times they were individually bought, and and thus their Jaccard score is .66. The AB would be 2 while A(item 1) = 2 and B(item2) = 3.

The similarity between items 3 and 6 would be 1 / (2 + 1 – 1), or .50 since they were bought together only once.

How you get the number of times (itemA/itemB) were bought together is up to you, my approach involved using python streaming so that I can run it on a hadoop cluster. I was inspired by Peter Skomoroch ‘s excellent article on similarity calculations using python streaming, found at


I’m currently computing item-item similarity on about 10k items using over 3.5 million ‘purchase’ records, and it runs in only a few minutes. When you generate the (item,item) pairs for each item in a user’s history, you will generate a LOT of data but then you can reduce this when you sum the counts of occurrences.

A white-paper on this can be found at http://www.infosci.cornell.edu/weblab/papers/Bank2008.pdf


AB / ( A + B – AB)

各ユーザーに対して、購入したアイテムのすべての「アイテム – アイテム」ペアを生成し、
その後、各「アイテム – アイテム」ペアの出現回数を追跡する。



どのようにアイテムペアの同時購入回数を計算するか、pythonによるhadoop streamingのアプローチがあります。



Perform cohort analysis on how much these different events lead to each other. Then you’ll be able to normalize the “downstream” likelihood (i.e., conditional probability) of events in relation to each other. For example, if 1 “clicked” event has .03 probability of leading to a “bought” event, now you can normalize to find the expected value of 1000 “clicked” events, in terms of “bought” outcomes.




I think you might be interested in looking at this paper:
H. Yu, Y. Koren, C. Volinsky. Collaborative Filtering for Implicit Feedback Datasets. IEEE International Conference on Data Mining 2008.


It includes a very thorough discussion of how to deal with implicit data (like clicks, views, etc); where, unlike ratings, you (a) don’t have negative feedback, and (b) the feedback you obtain can be used to measure the confidence you have that someone likes something, rather than their preference.
They also give some detailed descriptions of algorithms for this kind of data, ranging from neighborhood to latent factor models.






It’s entirely reasonable to map these behaviors to some scalar values — maybe a page view is “0.1″, a video play is “0.3″ and a favorite is “1.0″. Then you can apply any technique that operates on rating values. Picking the right values is up to your domain and your intuition, though you could also use machine learning techniques to figure optimum weights even!


さらに、あなたは最適の重量を計算するために機械学習技術を使用することができましたが、正しい値を取ることはあなたの領域とあなたの直観 さえあります!



One way would be to have separate matrices for each type of action.
You could interpolate the results from each to get your final output. Obviously “bought” would have a much higher weight than “clicked”… Instead of interpolation you could also use something like a backoff model.


How does one determine similarity between people online? (Quora和訳)

2012年1月1日 日曜日



How does one determine similarity between people online?


How do you quantify similarities between people using shopping sites, social networks, Q&A sites etc (e.g how do Netflix, Amazon, Paypal, Twitter, Linkedin, Facebook, Quora calculate user similarity/assign users into groups)





Here are some educated guesses about what serves as a basis for their work:

(1) Use vector models. For each user, look at the text he wrote.

Build a tf.idf vector. See http://en.wikipedia.org/wiki/TF_IDF for details.

Then compute the cosine between any two users.

You can supplement this with fancier Techniques such as LSI.

If you want to implement something such as similarity measure based on text, you can look at the

org.apache.lucene.search.similar package in Lucene (open source).

(2) Networking sites such as Facebook might simply look at how many friends you have in common and compute something like the Jaccard index .




(1)ベクトルモデルを使用してください。 ユーザーごとに、彼が書いたテキストをもとに。TF-IDFベクトルを構築します。





(関連キーワード: LSI / Jaccard Index )





A good place to start is: Programming Collective Intelligence


Whole books have been written on this area so answering it in a few words is not give anywhere near a full answer.

But if using the collaborative filtering approach, then find how many actions (e.g. products purchased, likes, friends) two people have in common. But to make it work well you have assign weights to the actions in common. The correlation value between two people is the sum of the weights of the actions. But the tricky part is assigning good weights to the actions. The weight has to be relative the two people, the cluster of the two people and the whole population.








しかし、トリッキーな部分はアクションに適切な重みを割り当てています。 重量は相対的な二人、二人のクラスタと全体の人口にする必要があります。



Recommendation mining is very useful to determine similarity between peoples using shopping sites, social networks and Q&A sites on the basis of their likes, choices and on-line behavior

Some of popular recommendation engine are


You can also find similarity between in text posted by authors in Q&A, social networks, blogs etc. For that you can try out following techniques

  • Jaccard Index
  • Simrank
  • Vector space model with Cosine Similarity
  • Sorensen similarity index
  • BM25 Okapi model







  • ジャッカード係数
  • Simrank
  • コサイン類似度によるベクトル空間モデル
  • Sorensen similarity index
  • BM25 Okapi model

(関連キーワード: Jaccard Index / Simrank / Vector space model with Cosine Similarity / Sorensen similarity index / BM25 Okapi model )