Contenu du post
🐦Special: Graph algorithms behind The Twitter Algorithm Twitter has recently published some details on their tweet recommendation algorithm (denoted as The Algorithm). Let’s dive into it from the graph learning perspective - it does have some interesting features spanning clustering, KG embeddings, ANN, and PageRank. Data-wise, the GraphJet framework operates on the Twitter interaction graph (in-memory) supporting dynamic edge updates and lookup queries. Several algorithms prepare features: - Graph clustering based on sparse binary factorization (SBF) to mine communities, and then the SimClusters approximate nearest neighbor search library to query for the most similar clusters. There are approximately 145k communities on Twitter and they are updated every few weeks. - Twitter Heterogeneous Information Network (TwHIN) embedding - this is largely based on the classic TransE for knowledge graph embedding. The KG is a multi-relational graph among Users, Tweets, Ads, and Advertisers. TwHIN learns shallow embeddings for all nodes. For inductive capabilities — building embeddings for newly arrived tweets or users — the model simply aggregates embeddings of neighboring nodes (my 2 cents - NodePiece would fit pretty well into this setup). - RealGraph models the user interactions graph and outputs the likelihood of two users’ interaction. There is a relatively straightforward logistic regression model for edge scoring on top of the RealGraph. - TweepCred - a PageRank score for users, this is your “influencer” score. In your feed, 50% of tweets come from your network (RealGraph features), 50% from out-of-network (SimCluster, TwHIN, and Social Graph traversals). 1500 candidates are sent to the ranking models: a lightweight logreg and heavier 48M-param neural net based on MaskNet. Ranked candidates are subject to filtering and postprocessing. Overall, the recommender pipeline runs about 5 billion times a day, so the latency requirements do play a major role in selecting shallow’ish graph models. Check the repos for more details. We’ll leave other peculiarities like “the Elon feature” for other researchers 🙂