The objective of this task is to tag a user's interests with three labels from 42 given ones. We model this task with neural networks, and the model structure is shown in Figure 2. Each blog is represented by a blog embedding [11] through convolution and max-pooling layers. Then we obtain a user's content embedding from weighted sum of all of his or her blog embeddings. The weighted value of each blog embedding is counted by self-attention mechanism. Content embedding and keyword embedding are concatenated as user embedding, and finally fed to the output layer.

**Figure 2.
**Framework of CNNs model based on weighted-blog-embeddings in task 2.In our system, a convolutional neural network (CNN) model is constructed for blog representation instead of a recurrent neural network (RNN) since more global information will be captured for indicating the user interests and the time efficiency will also be enhanced. It is widely acknowledged that multi-scale convolutional neural network [12] has been implemented due to its outstanding achievement on computer vision [13], and TextCNNs designed by arraying word embedding vertically has also shown quite high effectiveness for natural language processing (NLP) tasks [14].

In our CNNs model, we treat a blog as a sequence of words \(x=\left[{x}_{1},{x}_{1},\cdots ,{x}_{1}\right]\) where each one is represented by its word embedding vector, and returns a feature matrix *S* of the blog. The narrow convolution layer attached after the matrix is based on a kernel \(W\in {R}^{kd}\) of width *k*, a nonlinear function *f *and a bias variable *b* as described by Equation (6):

\({h}_{i}=f\left({W}_{{x}_{i:j+k-1}}+b\right)\) , (6)

where \({x}_{i:j}\) refers specifically to the concatenation of the sequence of words' vectors from position *i* to position *j*. In this task, we use several kernel sizes to obtain multiple local contextual feature maps in the convolution layer, and then apply the max-overtime pooling [15] to extract some of the most important features.

The output of that is the low-dimensional dense and quantified representation of each single blog. After that, each user's relevant blogs are computable. We simply average their blogs' vectors to obtain the content embedding \(c\left(u\right)\) for an individual user:

\(c\left(u\right)=\frac{1}{T}\sum _{i=1}^{T}{s}_{i}\) , (7)

where *T* is the total number of a user's related blogs.

However, different sources of blogs imply the extent of a user's interest in different topics. For example, a blog posted by a user may be generated from an article written by himself, reposted by other users, or shared by users from another platform. It is natural that we may pay attention to these blogs in varying degrees when we infer this user's interests. Thus, a self-attention mechanism is introduced, which automatically assigns different weights to each user's blog after training. The user context representation is given by weighted summation of all blogs' vectors:

\(\alpha =\frac{exp\left({e}_{i}\right)}{{\sum }_{j=1}^{T}{e}_{j}}\) , (8)

\({e}_{i}={v}^{T}tanh\left(W{s}_{i}+U{h}_{i}\right)\) , (9)

\(c\left(u\right)=\sum _{i=1}^{T}{\alpha }_{i}{h}_{i}\) , (10)

where \({\alpha }_{i}\) is the weight of the *i*-th blog, \({s}_{i}\) is the one-hot source representation vector of the blog, \(v\in {R}^{n\text{'}}\)__, __\(W\in {R}^{{n}^{\text{'}}×m}\)__, __\(U\in {R}^{{n}^{\text{'}}×n}\)__, __\({s}_{i}\in {R}^{m}\)__, __\({h}_{i}\in {R}^{n}\) , and *m* is the number of all source platforms.

When we finish a user's context representation, the keyword matrix of all blogs’keywords extracted by our model in task 1 will be concatenated. The final features are the output of above whole feature engineering. Afterwards, an ANN layer trains the user embeddings from the training set and predicts probability distribution of users'interests among 42 tags in validation and test set according to their embeddings.