Non-structural protein, RNA dependent RNA polymerase, docking, Andrographolide.


Hepatitis C virus infections represent primary public health concern. It is estimated that 3 to 4 million people are chronically infected globally. HCV infection increases the risk of liver cirrhosis and hepatocellular carcinoma development [1]. HCV infected individuals are treated with interferon, Ribavirin (a nucleoside analog) combination and other approved protease inhibitors [2]. Hepatitis C is termed as a silent disease since the incubation period of HCV ranges from 6 to 8 months with no symptoms, making early detection difficult [3]. However, the therapies have limited efficacy against HCV genotypes. HCV has been classified into different strains based on their genetic differences. HCV diverges into 6 genetic variants. Genetic diversity is also reported within the subgroups making genetic heterogeneity responsible for differences in disease outcome and response to treatment among HCV infected individuals [4]. Severe drug regime, side effects and high treatment costs limit patients’ compliance with the treatment. Therefore, a major focus towards development of cost effective and efficient anti-viral therapeutics targeting viral proteins with limited effects on the host system is of high priority.

HCV contains 9.6 kb positive – sense RNA genome and belong to Flaviviridae family. The viral polyprotein is processed into individual structural proteins – Core, E1, E2 and P7 and nonstructural(NS) proteins – NS2, NS3, NS4A, NS4B, NS5A and NS5B [5]. Host cytoplasm acts as a factory for HCV replication. RNA dependent RNA polymerase (RdRp) activity of HCV nonstructural protein NS5B mediates viral RNA genome replication. The importance of NS5B in synthesizing HCV RNA makes it a preferential target for anti-HCV inhibitor development. Furthermore, the host cells lack RdRp activity leading to minimal or no side effect on the host counterpart.

HCV NS5B has a number of enzymatic and structural differences from cellular RNA polymerases, which make it a promising target for small molecule antiviral development [6]. Phytochemicals from medicinal plants shows prominent role in prevention and therapy of many diseases [7]. Our research focuses on the insilico identification of promising bioactive molecules against HCV infections by considering NS5B as a primary target.

Materials and Methods

Identification of protein structure and preparation of small molecules

The NS5B is 65 kDa proteins with a typical ‘right hand’ polymerase shape containing finger, palm and thumb domain. The lambda 1 loop of HCV NS5B, which extends from the fingers to the thumb domain, is unique to RNA-dependent RNA polymerases and the beta-hairpin toward the catalytic region is specific for HCV NS5B. The structure of NS5B encloses motifs A-F, which are crucial for their functional activity [8]. Brookhaven protein data bank (PDB) is a major resource repository of three-dimensional structural information of biological macromolecules like nucleic acids and proteins. The X-ray structure of NS5B protein was retrieved from PDB (PDBID- 2YOJ) [9].

Traditional plant derived medicines are proved to be efficient in the treatment of liver disorders [10]. The bioactive compounds which can act against HCV NS5B were collected from literature [7, 11] and books on Indian medicine (Table 1). The SMILES (Simplified Molecular-Input Line-Entry System) notation of the compounds were retrieved from PubChem [30] and converted to PDB format using CORINA server [31]. The 2D structures of the compounds are presented in the S. Fig. 1.

Table 1

List of phytochemicals used in the study

Sl.No Phytochemical Plant name Family Reference
1 Andrographolide Andrographuis paniculata Acanthaceae [12]
2 Arjungenin Terminalia arjuna Combretaceae [13]
3 Arjunic Acid Terminalia arjuna Combretaceae [13]
4 Berberine Berberis aristata Berberidaceae [14]
5 Chasmanthin Tinospora cordifolia Menispermaceae [15]
6 Columbin Tinospora cordifolia Menispermaceae [16]
7 Desmethylwedelolactone Eclipta alba Asteraceae [17]
8 Esculetin Cichorium intybus Asteraceae [18]
9 Friedelin Azima tetracantha Salvadoraceae [19]
10 Gossypin Hibiscus vitifolius Malvaceae [20]
11 Hyperforin Hypericum perforatum Hypericaceae [21]
12 Hypericin Hypericum perforatum Hypericaceae [21]
13 Hypophyllanthin Phyllanthus amarus Phyllanthaceae [22]
14 Jatrorrhizine Enantia chlorantha Annonaceae [23]
15 Kutkoside Picrorhiza kurroa Plantaginaceae [24]
16 Phyllanthin Phyllanthus amarus Phyllanthaceae [22]
17 Picroside Picrorhiza scrophulariiflora Scrophulariaceae [25]
18 Pseudohypericin Hypericum perforatum Hypericaceae [21]
19 Silymarin Silybum marianum Asteraceae [26]
20 Sitosterol Melothria heterophylla Cucurbitaceae [27]
21 Swertisin Swertia chirayita Gentianaceae [28]
22 Thaliporphine Mahonia leschenaultia Berberidaceae [29]
23 Tinosporide Tinospora cordifolia Menispermaceae [16]
24 Wedelolactone Eclipta alba Asteraceae [17]

Binding site prediction – motif conservation

The full genome sequences of eighteen HCV strains belonging to six approved HCV genotypes were retrieved from NCBI [32] (Table 2). Considering the NS5B sequence of 2YOJ.pdb as the reference sequence all other sequences were multiply aligned to identify the conservation of functionally significant motifs A-D using CLC workbench ( The identified conserved motifs were considered as the ligand binding site.

Table 2

Genbank accession numbers of complete protein sequence of HCV genome

Genotype GB accession number Genotype GB accession number
1a AAB66324.1 3k BAA09890.1
1b Q9WMX2 4a CAA72338.1
1c BAA03581.1 5a CAA73640.1
2a AAY24373.1 6a CAA72801.1
2b AAP55704.1 6b BAA07103.1
2c BAA08911.1 6d AAZ85046.1
2k BAA88057.1 6g BAA09891.1
3a AAC03058.1 6h BAA32667.1
3b BAA08372.1 6k BAA32666.1

Molecular docking and ADMET predictions

Molecular docking predicts preferred orientation of the ligand molecule to the protein when bound together to form a stable complex. The scoring function gives the strength of association between them. AutoDock 4.2 [33] was used for docking study with the Lamarkian genetic algorithm to find globally optimized conformation. The grid spacing was set to 0.403 nm and grid box dimension was set to 86 x 60 x 60 which enclosed the residues of motifs A to D. Default settings were applied for remaining parameters. At the end of a docking with multiple runs, a cluster analysis was performed. Docking solutions were clustered together and ranked by the lowest docking energy. The lowest binding energy cluster was selected as the representative binding mode. Out of twenty four bioactive compounds, ten compounds with least binding energy were considered for the next phase of screening.

Absorption, distribution, metabolism, excretion and toxicity (ADME/T) properties receive more concern in rational drug design, as they determine the development of safe orally bioavailable drug. The determination of characteristics of compounds that are more likely to exhibit satisfying ADME/T properties has led to the concept of “druglikeliness”. Drug likeliness properties obeying rule of thumb [34] and molecular properties governing pharmacokinetics of the drugs, including human intestinal absorption, aqueous solubility, plasma protein binding and Ames mutagenicity values [35] were predicted using MolInspiration [36] and ChemSilico server ( These parameters act as a filter to screen out more potential leads.

Biological activity profiles are precise indicators not only for molecular properties, but also for biological response of the molecules. Biospectrum connects chemical scaffolds with biological activity. PASS (prediction of activity spectra for biologically active substances) server [37] predicts the pharmacological effects and biochemical mechanisms of biologically active substances based on their structural formula. The biological activity spectrums for compounds emerging out of screening processes were predicted. The pass activity score for hepato-protectant activity (HPA), hepatic disorder treatment (HPT) and antiinflammatory activity (AIA) were taken under consideration for further analysis.

Molecular dynamics simulation

The molecular dynamics simulation (MDS) calculates the time dependent behavior of the molecular system. They are used to investigate the thermodynamics of biological macromolecules and their complexes. MDS was performed using GROMACS 4.5.3 [38]. Gromos96 forcefield [39] was used to prepare the protein topology file. Ligand topology and forcefield parameter file were prepared using PRODRG server [40]. The full system was subjected to 5 ns MDS at 300 K temperature and 1 bar pressure. The best protein ligand complex was subjected for molecular dynamics simulation study.

Results and Discussion

NS5B structural and functional analysis

The hydrophilic nature of NS5B was predicted as – 0.201 based on a Gravy index (grand average of hydropathy) by Expasy ProtParam [41]. RdRp (Fig.1) is a compact globular protein divided into finger, palm and thumb domains. The finger domain has two sub-domains, α finger with seven α helix and β finger with four β strands. The β finger is connected to thumb domain through two loops Ala9-Thr41 (loop 1) and Asn142-Ala157 (loop2). The incoming ribonucleoside tri-phosphates (rNTP) enter through a small hole at the bottom of loop 2 to reach the binding cleft. The palm domain is a combination of three anti-parallel β strands and four helices [6]. It forms the most important catalytic center of RdRP and contains highly conserved motifs A to F. Seven α helices and an anti-parallel β strand constitutes the thumb region. The processed RNA duplex is released with the help of the β sheet in the thumb domain [42].

The multiple sequence alignment of NS5B protein sequences from six genotypes [43, 44] and their sub-types approved by ICTV (International Committee for the Taxonomy of Viruses) were proved to be conserved in RNA dependent polymerases by MSA. Motif A (Asp220-Asp225), motif C (Gly317-Asp319) and motif D (Ala342- Tyr346) are involved in nucleotide tri-phosphate binding (NTP) and catalysis while motif B (Ser282- Asn291) is involved in template-primer positioning. The conserved patterns of motifs A-D are highlighted in Fig.2. The residues involved in divalent cation binding, substrate binding, template selection and differentiation remain highly conserved thus validating the binding site preferences of the drugs.

Molecular Docking

To identify novel candidate inhibitors of HCV polymerase protein, the twenty four compounds were docked in the catalytic center of the palm region. Different binding poses of twenty four compounds were searched and ranked based on their binding affinity. Top ten compounds (andrographolide, esculetin, tinosporide, columbin, kutokoside, arjunic acid, chasmanthin, silymarin, gossypin and hypercin) were identified based on the AutoDock ranking (Fig.3). The binding energy distribution of the first ten compounds ranged from -5.42 kcal/mol to -10.43 Kcal/mol. The Binding free energy and binding interaction patterns of all the ligands were analyzed using Discovery Studio Visualizer and PyMol. The free binding energy and interaction patterns of the ligands represented in Table 3 are considered as the basic criteria for efficient activity. The binding mode analysis of the compounds with NS5B provides a better insight into the essential residue interactions.

Cation-п interactions [45] are strong, non-covalent binding force conferring stability in ligand-receptor complex. Strong cation-п interaction was reported among the aromatic group of the ligands with the residues Arg158, Lys141. The lower end of the β finger of NS5B protein forms a rim of basic amino acids (Arg48, Lys51, Lys141, Lys155 and Arg158) to anchor negatively charged rNTPs and Arg158 plays a key role in rNTP binding. The blockage of this region can inhibit the RdRp activity of the protein. HCV NS5B shows a strong preference towards RNA as a template and rNTPs as substrate. Asp225 of motif A and Ser288 of motif B are substrate discriminating residues. It should be noted that, many compounds under our analysis form hydrogen bonds and electrostatic interactions with the above residues, silencing their pivotal function. A hydrophobic pocket formed by Asp225 of motif A and Ser282, Thr281 and Asn291 of motif B accommodates ribose moiety of rNTP. Interaction with these residues further hinders the protein’s functionality. Majority of the ligands interactions were observed with the catalytic aspartate residues Asp220, Asp318 and Asp319. The C-terminal region constitutes the regulatory part of HCV NS5B protein. Ser556 and Ser543 are phosphorylated via casein kinase-2. Upon phosphorylation RdRp becomes more active. Interaction of the ligands with any one of these residue can inactivate RdRp functional propagation. .

Fig. 1: Three dimensional structure of HCV NS5B polymerase (2YOJ) with finger, palm and thumb domains. The domains are colored to emphasize their structural significance. Purple: α fingers, green: β fingers, cyan: palm and pink: thumb domains. The cartoon representation of motifs A – D.

Fig. 2: Multiple sequence alignment of the RdRp sequences of 6 different genotypes from region 200-400 depicting the conservation patterns. The motifs A: D220, T221, R222, C223, F224 and D225.motif B: S282, G283, V284, L285, T286, T287, S288, C289, G290 and N291. Motif C: G317, D318 and D319 and motif D: A342, M343, T344, R345 and Y346 are enclosed within a rectangular box. The amino acids are colored based on their nature.

Fig. 3: The successors from first screening phase. Interaction of phytochemicals with the motifs A-C is highlighted. M-A*: interaction with motif A residues; M-A & B*: interaction with motifs A and B residues; M- B& C* interaction with motif B and C residues; M- C*: interaction with motif C residues. (The compounds in our study did not show preferential binding towards the residues of motif D, hence not shown in Fig).


Binding energy and interacting residues of all the 24 ligands

Sl.No Compounds Binding Energy (Kcal/mol) Residues involved in interactions
1 Andrographolide -10.43 Thr287, Asn291, Asp318
2 Esculetin -9.91 Ser288, Asn291, Gly317(?-?), Asp318, Tyr555
3 Columbin -8.96 Lys141, Arg158(?-?), Asn291
4 Tinosporide -8.6 Lys141, Asn291, Asp318
5 Kutkoside -7.7 Arg158, Tyr219, Thr221, Asp225, Asn291
6 Arjunic Acid -7.66 Arg158, Ser226
7 Chasmanthin -7.57 Lys141, Asp225, Asn291, Asp318
8 Silymarin -7.54 Tyr219, Asp319, Leu320, Ser556
9 Gossypin -7.24 Arg158(+-?), Cys223, Asp225, Asn291, Asn316, Asp318, Asp319, Ser367
10 Hypericin -7.14 Lys141(+-?), Arg158, Ser282, Asp318
11 Picroside -6.92 Asp318, Asp319, Ser556
12 Swertinin -6.84 Arg158 (+-?), Ser226, Asp318
13 Sitosterol -6.8 Ser556
14 Hyperforin -6.67 Asp225, Asn291
15 Wedelolactone -6.65 Arg158(+-?), Asp225, Ser282, Asn291, Asp318
16 Friedelin -6.5 No interaction
17 Berberine -6.32 Arg158(+-?), Asn291
18 Arjungenin -6.08 Asp225, Asp318, Cys366, Ser556
19 Jatrorrhizine -6 Arg158(+-?), Asp318
20 Desmethylwedelolactone -5.94 Arg158(+-?), Asp225, Ser556
21 Pseudohypericin -5.53 Arg158(+-?), Asp225, Asp318, Asn316
22 Phyllanthin -5.42 Lys141, Arg158(+-?)
23 Thaliporphyine -5.04 Arg158 (+-?), Asn291
24 Hypophyllanthin -4.46 Lys141, Arg158, Ser556

Physiochemical property profile and Bioactivity predictions

The subsequent screening phase was based on physiochemical properties of the compounds under study. The ten phytochemical compounds were evaluated against thirteen physiochemical properties which form the backbone of drug designing process. Compounds which satisfy the Lipinski’s rule of five (RO5) and other physiochemical parameters were selected for further analysis.

Hypericin and kutokoside with molecular weight 504.45 Da and 512.46 Da were excluded from the study, as high molecular weight affects drug solubility, absorption and diffusion. Moderate ClogP (Partition coefficient) is desired for GI track absorption [46]. Based on RO5 compounds possessing hydrogen bond donor <5, hydrogen bond acceptor <10 and total hydrogen bonds <=12 were retained for further analysis [35]. Compounds with CSLogD7.4 (Distribution coefficient) in the range of 1-3 shows moderate solubility and permeability along with favorable in-vivo oral absorption and blood brain barrier (BBB) penetration [47]. Gossypin with the distribution coefficient value of -1.5 was excluded from further analysis as it failed to fall within desired range. Chasmanthin with high intrinsic solubility value makes it an unfavorable lead compound [34]. The intact drug binding to plasma protein affects its displacement in the body [35] and silymarin with 98% of plasma binding was eliminated from the study. The molar refractivity value of arjunic acid was found to be 187.72 making it unfavorable for entering the next phase [48]. Polar surface area (PSA) measures drug’s ability to permeate cell [49, 50] and its cutoff were between 60 Å2 and 140Å2 for good permeability. Compound kutokoside showed higher PSA value along with high molecular weight. Aromatic and heteroaromatic rings are ubiquitous features in small-molecule drugs. The rings possess fewer degrees of freedom compared to chains; hence show increased drug-receptor binding energy. Less number of aromatic rings (< 3) favors an oral drug candidate [51]. Hypericin with 6 aromatic rings and high molecular weight was excluded from the screening process.

Improvisation of our previous study on phytochemical screening against NS3 protein [5] was accomplished by performing human intestinal absorption (HIA) and Ames test for crucial selection of phytochemicals against our current target HCV NS5B. Final screening tested HIA [52] and Ames mutagenecity. The compounds which showed positive HIA and Non-Toxic Ames test were finally selected. After a series of crucial screening tinosporide, columbin, andrographolide and esculetin emerged as promising drug candidates satisfying all the screening criteria (Table 4).

Table 4

Physiochemical and pharmacokinetic properties of final hits

C* P1a P2b P3c P4d P5e P6f P7g P8h P9i P10j P11k P12l P13m P14n
C1a 1.7 1.7 -2.8 68.1 127.9 55 86.9 350 5 3 6 0 + NT
C2b 1.3 1.1 -2.4 76.4 53.5 19 70.6 178 4 2 0 1 + NT
C3c 1.7 1.7 -1.7 81.2 113.1 49 98.5 374.2 7 1 1 1 + NT
C4d 1.5 1.5 -3.2 88.7 113.7 48 85.9 358.4 6 1 1 1 + NT

C*: Compounds; C1a: Andrographolide; C2b: Esculetin; C3c : Tinosporide; C4d: Columbin;

P1a: CSLogP-Partition co-efficient; P2b: CSLogD7.4-Distribution co-efficient at pH 7.4; P3c : CSLogSo-aqueous solubility; P4d:Plasma binding (%);P5e: Molar refractivity; P6f : Number of atoms; P7g: Polar surface Area,Å2; P8h: Molecular Weight (Daltons); P9i : Number of hydrogen bond acceptor; P10j : Number of hydrogen bond donor; P11k: Number of rotatable bonds; P12l : Number of rings ; P13m: Human intestinal absorption; P14n: AMES test; NT: non toxic

Bioactivity prediction forms the final phase of screening. The Hepato-protectant, hepatic disorder treatment and anti-inflammatory activity were predicted for the final four hits using the Pass server (Fig.4). The compound with Pa (probability to be active) > 0.7 was considered to exhibit the activity in an experiment by the PASS server. Andrographolide was finally narrowed down as it showed high Hepato-protectant activity (Pa=0.98) and a higher binding energy of -10.43 Kcal/mol (Fig.5).

Fig. 4: Bioactive spectrum plot of top 4 hits (tinosporide, columbin, andrographolide and esculetin).

Fig 5. (a) Binding of Andrographolide with NS5B polymerase. Binding site residues are colored yellow and ligand is represented in ball and stick model. (b) 2D plot showing ligand binding site and interacting residues. Arrows indicate hydrogen bond formation between the ligand and the protein. The surface representation of the protein was generated using PyMol and 2D plot was generated using Discovery Studio Visualizer.

Molecular dynamics simulation

For a stronger and better insight into the validity of screening strategy, andrographolide-NS5B complex was subjected to molecular dynamic simulation study. The stability of the complex was monitored over the entire simulation time. The lowest binding energy conformation was taken as initial conformation and solvated using SPC (single point charge) water molecule and neutralized by adding chlorine ions. In the first equilibration phase a 30 ps molecular dynamics was performed at 300K and the system was relaxed by 1000 steps of conjugate gradient energy minimization. In the second equilibration phase the system was subjected to 2000 ps dynamics at 300 K and 1 bar pressure. Finally a 5 ns MD simulation was performed to relax the protein-ligand complex. The potential energy fluctuations and RMSD of Cα atoms were monitored. The potential energy of the whole system remained constant throughout the simulation process. The protein structural flexibility during the simulation process was evident from the small drifts observed in the RMSD plot. But the RMSD fluctuations converged after 3 ns simulation proving that the stability was conferred in the protein structure (Fig.6)

Fig 6. (a) The RMSD plot obtained by fitting C-alpha of the protein after MD simulation showing convergence after 3 ns. (b) Potential energy plot depicting the stability of the protein.

S. Fig. 1: 2D structures of Phytochemicals used in the study


A high throughput in silico method curtails the time and cost spent in the synthesis and testing of compounds before entering the clinical trials. Modern drugs have their lineage in traditional medicines. The Indian system of medicines based on herbs is gaining importance globally in recent years because of their efficacy, safety, easy accessibility and cost effectiveness. ADMET has become an integrated part of the drug discovery setup, providing guidance in lead selection and optimization. In our study andrographolide, columbin, esculetin and tinosporide have exhibited the characteristics of a lead compound. In addition to the lead like characteristics, andrographolide has shown good bioactive spectrum as hepatoprotectant. On further enhancement these leads can evolve as promising therapeutic agents against HCV induced hepatic disorders.

Conflict of Interest

We have no conflicts of interest to disclose.


We acknowledge VIT University, India for providing computational facility and support throughout the work.